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Run on: 



January 28, 2004, 20:49:00 ; Search time 12457 Seconds 

(without alignments) 
11389.146 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 



US-10-056-884A-1 
3468 

1 caagcactgtgctaaagtgt . 



. aaaaaaaaaaaaaaaaaaaa 3468 



Scoring table: 



IDENTITY_NUC 

Gapop 10.0 , Gapext 1. 



Searched: 



2888711 seqs, 20454813386 residues 



Total number of hits satisfying chosen parameters: 



5777422 



Minimum DB seq length: 
Maximum DB seq length: 



2000000000 



Post-processing : 



Minimum Match 0% 
Maximum Match 100% 
Listing first 45 summaries 
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gb_ba : * 
gb_htg : * 
gb_in: * 
gb_om : * 
gb__o v : * 
gb_pat : * 
gb_ph : * 
gb_pl : * 
gb_jpr : * 
gb_ro : * 
gb_sts : * 
gb_sy : * 
gb_un : * 
gb_vi : * 
em_ba : * 
em_f un : * 
em_hum : * 
em_in: * 
em_mu: •* 
em_om : * 
em_or : * 
em_o v : * 
em_pat : * 
ern_ph : * 
em_pl : * 
em_ro : * 
em sts:* 



28 


em_ 


_un:* 


29 


em_ 


_vi : * 


30 


em 


_htg_hum: * 


31 


em 


htg inv:* 


32 


em_ 


_htg_other : * 


33 


em_ 


htg mus : * 


34 


em 


htg pin:* 


35 


em 


htg rod:* 


36 


em_ 


_htg_mam: * 


37 


em 


htg vrt:* 


38 


em_ 


sy : * 


39 


em 


htgo hum: * 


40 


em 


Jitgo mus : * 


41 


em_ 


htgo other:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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RESULT 1 
AB037738 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 



JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



FEATURES 

source 



gene 
CDS 



AB037738 5646 bp mRNA linear PRI 14-MAR-2000 

Homo sapiens mRNA for KIAA1317 protein, partial cds . 

AB037738 

AB037738.1 GI:7243014 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (sites) 

Nagase,T., Kikuno / R. / Ishikawa, K. I . , Hirosawa,M. and Ohara,0. 

Prediction of the coding sequences of unidentified human genes. 

XVI. The complete sequences of 150 new cDNA clones from brain which 

code for large proteins in vitro 

DNA Res. 7 (1), 65-73 (2000) 

20181126 

10718198 

2 (bases 1 to 5646) 

Ohara,0., Nagase,T. and Kikuno,R. 
Direct Submission 

Submitted ( 31- JAN-2000 ) Osamu Ohara, Kazusa DNA Research Institute, 
Laboratory of DNA Technology; 1532-3 Yana, Kisarazu, Chiba 
292-0812, Japan (E-mail : cdnainf o@kazusa . or . jp, 
URL:http: //www. kazusa.or.jp/huge/, Tel : +81-438-52-3913, 
Fax:+81-438-52-3914) 

Location/Qualifiers 

1. .5646 

/organism= M Homo sapiens" 

/mol_type="mRNA" 

/db_xref="taxon:9606" 

/clone= ,, fhl3117 n 

/tissue_type= "brain" 

/clone_lib="pBluescriptII SK plus" 

1. .5646 

/gene="KIAA1317" 
<1071. .2378 
/gene="KIAA1317" 

/note="Start codon is not identified." 



/codon_start=l 
/product="KIAA1317 protein" 
/protein_id="BAA92555.1" 
/db_xref="GI : 7243015" 

/ 1 r a n s 1 a t i on= 11 QQQKKGTMAL S GNC S RY Y P REQGS AVPN S FP E WELNVGGQVY F 
TRHSTLI S I PHSLLWKMFS PKRDTANDLAKDS KGRFFI DRDGFLFRYI LDYLRDRQW 
LPDHFPEKGRLKREAEYFQLPDLVKLLTPDEIKQSPDEFCHSDFEDASQGSDTRICPP 
SSLLPADRKWGFITVGYRGSCTLGREGQADAKFRRVPRILVCGRISLAKEVFGETLNE 
SRDPDRAPERYTSRFYLKFKHLERAFDMLSECGFHMVACNSSVTASFINQYTDDKIWS 
SYTEWFYREPSRWSPSHCDCCCKNGKGDKEGESGTSCNDLSTSSCDSQSEASSPQET 
VICGPVTRQTNIQTLDRPIKKGPVQLIQQSEMRRKSDLLRTLTSGSRESNMSSKKKAV 
KEKLSIEEELEKCIQDFLKIKIPDRFPERKHPWQSELLRKYHL" 

BASE COUNT 1618 a 1169 c 1150 g 1709 t 

ORIGIN 

Query Match 92.2%; Score 3198.4; DB 9; Length 5646; 

Best Local Similarity 99.2%; Pred. No. 0; 

Matches 3225; Conservative 0; Mismatches 26; Indels 1; Gaps 1; 

Qy 183 ACCAAT AC GGAC AT CT GAGTAACT GGGGAATT GGC CT GC CT T GC AT GT GAGCT T GAT GGA 242 

II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 760 ACT AAGAC GGAC AT CT GAGTAACT GGGGAATT GGC CT GC CT T GCAT GT GAGCT T GAT GGA 819 

Qy 243 AGATTGGAT ATAGACGAGTT GATTATATTTT AT GAAGTAGCAGCTCACTAC CATCCACCA 302 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 82 0 AGATT GGAT ATAGACGAGTT GATTATATTTT ATGAAGTAGCAGCTCACTAC CATCCACCA 87 9 

Qy 303 TCCAGGGTTTAAACTACTTTTTCAGCATCACTTCACCTGTGGACTCTTATACATTTTGAT 362 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 880 TCCAGGGTTTAAACTACTTTTTCAGCATCACTTCACCTGTGGACTCTTATACATTTTGAT 939 

Qy 363 T TCT T GGGGGAAAAAT ACT GGGATAAGAGGAGGT C ATT TTTT AATAAGTT AGC AT C CTT T 422 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 94 0 T T CT T GGGGGAAAAAT ACT GGGATAAGAGGAGGT C ATT TT TT AATAAGTT AGC AT C CTT T 999 

Qy 423 TCCCTTTCTTACAAGTTGATCCAAAGGATAAGGCTGTGACTCCATTGGATTGCACCTTTA 482 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1000 T CCCT TT CT T ACAAGT T GAT CCAAAGGATAAGGCT GT GACT C CAT T GGATT GC ACCTTT A 1059 

Qy 483 AATCAAAAT AGCAGCAGCAGAAGAAAGGGACAAT GGCT CTGAGTGGAAACT GTAGTCGTT 542 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1060 AAT CAAAAT AGCAGCAGCAGAAGAAAGGGACAAT GGCT CTGAGTGGAAACT GTAGTCGTT 1119 

Qy 543 ATTATCCT CGAGAACAAGGGT CCGCAGTT CCCAACTCCTT CCCTGAGGTGGTAGAGCT GA 602 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1120 ATTATCCTCGAGAACAAGGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGA 1179 

Qy 603 AT GT CGGGGGT CAAGT T TATT T TACTC GCC ATT C CAC AT T GATAAGC ATC C CT CAT T C C C 662 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I N I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 118 0 ATGTCGGGGGTCAAGTTTATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCC 1239 

Qy 663 T C CT GT G GAAAAT GT TTT C CCCAAAGAGAGACAC GGCT AAT GATCT AGC CAAGGACT C CA 722 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I II I II I I 
Db 1240 TCCTGTGGAAAATGTTTTCCCCAAAGAGAGACACGGCTAATGATCTAGCCAAGGACTCCA 1299 



Qy 



723 AGGGAAGGT TTTT C ATT GACAGAGAT GGATTCTT GTT C C GTT AT ATT CT GGACT AT CT CA 782 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 1300 AGGGAAGGTTTTTCATTGACAGAGATGGATTCTTGTTCCGTTATATTCTGGACTATCTCA 1359 

Qy 783 GGGACAGGC AGGT GGT C CT GC CTGAT CACT TT C C AGAAAAAGGAAGACT GAAAAGGGAAG 842 

I I I I I I I I I I I I I I I I I I II I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 1360 GGGACAGGC AGGT GGT C CT GC CT GAT CACT TT C C AGAAAAAGGAAGACT GAAAAGGGAAG 1419 

Qy 843 CT GAAT ACTT C CAGCT CC C AGACT T GGT CAAACTC CT GAC C CCCGAT GAAAT CAAGCAAA 902 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 1420 CT GAAT ACTT C CAGCT CC C AGACT T GGT CAAACT C CT GAC C CC CGAT GAAAT CAAGCAAA 1479 

Qy 903 GCCCAGAT GAATTCTGCCACAGTGACTTT GAAGAT GCCTCCCAAGGAAGCGACACAAGAA 962 

I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 1480 GC C C AGAT GAAT T CT GCCACAGTGACTT T GAAGAT GC CTC CCAAGGAAGC GACACAAGAA 1539 

Qy 963 TCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTT 1022 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 1540 TCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTT 1599 

Qy 1023 AC AGAGGAT C CT GC AC CT T GGG CAGAGAGGGACAGGC AGAT GC CAAGT TT C GGAGAGT TC 1082 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1600 ACAGAGGAT CCT GC AC CTT GGGC AGAGAGGGACAGGC AGAT GC CAAGT T T C GGAGAGT TC 1659 

Qy 1083 CCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTTGA 1142 

I I I II II I I I I I I I I II I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I 
Db 1660 CCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTTGA 1719 

Qy 1143 ATGAAAGCAGAGACCCTGATCGAGCCCCAGAAAGATACACCTCCAGATTTTATCTCAAAT 1202 

I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1720 AT GAAAGC AGAGAC CCT GAT C GAGC C C C AGAAAGAT AC AC CT C CAGATTT T AT CT CAAAT 1779 

Qy 12 03 TCAAGCACCTGGAAAGGGCTTTTGATATGTTGTCAGAGTGTGGATTCCACATGGTGGCCT 1262 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 17 80 TCAAGCAC CT GGAAAGGGCTTT T GAT AT GT T GT CAGAGT GT GGATT C CAC AT GGT GGC CT 1839 

Qy 1263 GTAACT CAT CGGT GAC AGC AT CTT T CAT CAAC CAAT AT AC AGAT GACAAGAT CT GGT CAA 1322 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1840 GTAACT CATCGGT GACAGCAT CTTT CAT CAAC CAAT AT AC AGAT GACAAGATCTGGT CAA 1899 

Qy 1323 GCT AC ACT GAAT AT GT CTT CT ACC GT GAGC CT T CC AGAT GGT C ACCCT CAC ACT GCGATT 1382 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1900 GCT AC ACT GAAT AT GT CTTCTACC GTGAGCCTT CCAGATGGTCACCCTCACACTGCGATT 1959 

Qy 1383 GCTGCTGCAAGAATGGCAAAGGTGACAAAGAAGGGGAGAGCGGCACGTCTTGCAATGACC 1442 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 1960 GCT GCTGCAAGAAT GGCAAAGGTGACAAAGAAGGGGAGAGCGGCACGTCTT GCAATGACC 2019 

Qy 1443 TCT CCACATCTAGCTGCGACAGCCAGT CT GAGGCCAGCTCT CCCCAGGAGACGGTCAT CT 1502 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2020 TCT C CACAT CT AG CT GC GAC AGCC AGT CT GAGGCCAGCT CT C C CCAGGAGAC GGT CAT CT 2079 

Qy 1503 GT GGT CCC GT GACAC GC C AGAC CAAC AT C CAGACT CT GGAC C GT CC CAT CAAGAAGGGCC 1562 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2080 GT GGT CC C GT GACACGC C AGAC CAAC AT C CAGACT CT GGAC CGT CCC AT CAAGAAGGGCC 2139 

Qy 1563 CT GT C CAGCT GAT C CAACAGT C AGAGAT GC GGC GGAAAAGCGACTT ACT CCGGATT CT GA 1622 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 2140 CT GTCC AGCT GAT C CAACAGT C AGAGAT GC G GC GGAAAAG C GACTT ACT CCGGACTCT GA 2199 



Qy 1623 CTT CAGGCTCCAGGGAAT C GAACAT GAGCAGCAAAAAAAAAGCT GTTAAAGAAAAGCT CT 1682 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2200 CTT CAGGCTCCAGGGAAT CGAACAT GAGCAGCAAAAAAAAAGCTGTTAAAGAAAAGCTCT 2259 

Qy 1683 CAAT T GAGGAGGAGCT GGAGAAAT GT AT C C AGGAT TT C CTAAAAAAAAAAATT C CAGAT C 1742 

I I I II II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I II I 
Db 2260 CAAT T GAGGAGGAGCT GGAGAAATGT AT CCAGGATTTCCTAAAAATCAAAATT CCAGAT C 2319 

Qy 1743 GGT T T CCT GAGAGAAAACAT C CTT G GCAAT CT GAACT TTTAAGGAAGT AT CAT CT ATAAG 1802 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 2320 GGT T T CCTGAGAGAAAAC AT C CT T GGCAAT CT GAACTT TTAAGGAAGT AT CAT CT ATAAG 237 9 

Qy 1803 GGAGGGCTGGGGGCGGGGAAAAAAAAAAAAAAGAGTCATTTTGAAATTAACCTCATAAAA 1862 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
Db 2380 GGAGGGCTGGGGGC-GGGAAAAGAAAAAAAAAAAGTCATTTTGAAATTAACCTCCTAAAA 2438 

Qy 1863 G G AAT T CAT AT T T T AAAG G AAAAAAAT AC AAC T AAT GAT GC AC AT T T C T T AGAAC AC AAT 1922 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2439 GGAATT C AT ATT T T AAAGGAAAAAAAT ACAACTAAT GAT GC ACAT TT CTT AGAACACAAT 2498 

Qy 1923 AGT C CAT T GAT AT ACT ACT GC CTACTTTACCTAGTT CAC CT TAAC AT GTAAAT C CAC AGG 1982 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
Db 2499 AGT C CAT T GAT AT ACT ACT GCCT ACTT T AC CT AGTT CACCT TAAC AT GTAAAT CCAC AGG 2558 

Qy 1983 GT AGAT TT CT TT CT AGAT GTGGAAGT ACAAGAAAAT CT TTTTTAGTT ATT T GTTT GT TT A 2042 

I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2559 GT AGAT TT CTTT CT AGAT GT G GAAGT ACAAGAAAAT CT T TTTTAGTT ATT T GTTTGT TT A 2618 

Qy 2043 CTT CGT CCC ATGT GCT AACT AT CT TAT AT AT AAT GAGAGCCAGCT AC GTAAAAGT AGCT G 2102 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2619 CTT CGT CCC AT GT GCT AACT AT CT TAT AT AT AAT GAGAGCCAGCT AC GTAAAAGT AGCT G 2678 

Qy 2103 AGAGGCCTTGGGAGTCATTTATCCCAAACTGGGTTTTTTCTCTCATCCTTCTACCTCCCT 2162 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Db 2679 AGAGGCCTTGGGAGTCATTTATCCCAAACTGGGTTTTTTCTCTCATCCTTCTACCTCCCT 2738 

Qy 2163 C CTT T GAAT GAGGGT ATGGT AGAAAAAGAT CT GGC C CAATGGCATAAGTT T GGAAT T TT T 2222 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2739 C CT T T GAAT GAGGGT ATGGT AGAAAAAGAT CT GGCC CAAT GGCATAAGTT T GGAAT T TTT 2798 

Qy 2223 AATTTTGGTTTTTCCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTTTT 2282 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2799 AATTTTGGTTTTTCCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTTTT 2858 

Qy 2283 CACT CAAAT CTAT AT GT GC CAGTT TAT ATT GACT C C GT AT GCAT GAGT AT TT GT GCAAC A 2342 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2859 CACT CAAAT CTAT AT GT G C CAGTT TAT ATT GACT C C GT AT GCAT GAGT AT T T GT GCAAC A 2918 

Qy 2343 CAAGC ACAACTAAGT AT GT AT AT ACACAT GAC GC AC AC GAT GCC AGGGC CT AGACCT CC C 2402 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2919 CAAGCACAACTAAGTATGTATATACACATGACGCACACGATGCCAGGGCCTAGACCTCCC 2978 

Qy 2403 AAGGGCTGTGCTCCTGCTCCCAGCAGCCCTCTCTTAGAATATTTCAGATGGATGAGCTTC 24 62 

I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 2979 AAGGGCTGTGCTCCTGCTCCCAGCAGCCCTCTCTTAGAATATTTCAGATGGATGAGCTTC 3038 



Qy 2463 TGACTCTTTCTTAAAATTCTTTTGGGAAGATTTCCCAGCCTTTCTTCACAACACTTTCTA 2522 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I 
Db 3039 TGACTCTTTCTTAAAATTCTTTTGGGAAGATTTCCCAGCCTTTCTTCACAACACTTTCTA 3098 

Qy 2523 AC AT CAAAT GACT CT CAT CAT CAACAAAT T GT AT T CCTT ATT GT GAAAT TAAT ACC CT C A 2582 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3099 AC AT CAAAT GACT CT CAT CAT CAACAAAT T GT AT T CCTT AT T GT GAAATTAAT ACC CTCA 3158 

Qy 2583 GGCT C CATTTT ACT GCT T T GCT CT TT GTCT GCATT AAGAGAGGAT GAGGAGAGCTGGTC A 2642 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3159 GGCT C CATTT TACT GCT TT GCT CT TT GTCT GCAT T AAGAGAGGAT GAGGAGAGCTGGTCA 3218 

Qy 2643 AACAT T CCTT GT GTT AAAAAAAT CAAAC AT T CAT AT C CACAAAAT T TT CT GCTAAAT GAC 2702 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3219 AACATT CCTT GTGTTAAAAAAATCAAACATT CATATCCACAAAATTTTCTGCTAAAT GAC 3278 

Qy 2703 T C CAC ACT C AGCCT T CT CT AC C CT GAACT GAATT AT CAC CCTTTTCTC CAT GTT TT CAGA 2762 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3279 TCCACACTCAGCCTTCTCTACCCTGAATTGAATTATCACCCTTTTCTCCATGTTTTCAGA 3338 

Qy 2763 GTTCTTACTGCCCACAGTTTAATGGTGTGGCCTTTCCACATAATCCACATTAAGTTCTGT 2822 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3339 GTTCTTACTGCCCACAGTTTAATGGTGTGGCCTTTCCACATAATCCACATTAAGTTCTGT 3398 

Qy 2823 GTTCCTGTGTTGTTGTGGAACTAAGGACAACACACAGTACTTGAATAAGGGTCCGGCCTT 2882 

I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 3399 GTTCCTGTGTTGTTGTGGAACTAAGGACAACACACAGTACTTGAATAAGGGTCCGGCCTT 3458 

Qy 2883 TT GTTTGTTTTAGAGAAAGTTGTATT CCACACACAACCTAATAATTT CTTATAAAAATTT 2942 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3459 TT GTTT GTT TT AGAGAAAGTT GT AT T C C ACAC ACAACCTAATAATTT CTTATAAAAATTT 3518 

Qy 2943 TAAACTACAAAGCTACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATTCG 3002 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3519 TAAACTACAAAGCTACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATTCG 3578 

Qy 3003 GGCT T T GGCT GT GCC C AT GCT AGGAT T T AGCT GT GT C ATT T T TAT GAT GTCT GTAACAAC 3062 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3579 GGCT T T GGCT GT GCC CAT GCT AGGAT T TAGCT GT GT C ATT T T TAT GAT GTCT GTAACAAC 3638 

Qy 3063 C CAACAAGGT AACT GAAGCT C C AGAGTTAAGGTTT C AGATT T CT AAAT GAAACT AT CT T T 3122 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3639 C CAACAAGGT AACT GAAGCT C C AGAGTTAAGGTTT C AGAT T T CT AAAT GAAACT AT CT T T 3698 

Qy 3123 TT CAATTACATCCT GACTTGT ATAGACACAGCCAAAAAGAAACT GTTAATAGCCATCCGT 3182 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3699 TT CAATTACATCCT GACTT GTATAGACACAGCCAGAAAGAAACT GTTAATAGCCATCCGT 3758 

Qy 3183 C CAT GTAACT CT GT AT T TT ACTAAGGT AC CAAT AGCT CT TT CAT AGACT T GT GCT ACAAG 3242 

I I I I I I I I I I I I I I I I I I I I I fl I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3759 C CAT GTAACT CT GT AT T TT ACTAAGAT AC CAAT AGCT CT TT CAT AGACT T GT GCT ACAAG 3818 

Qy 3243 AAGGTTAAAAGACCAGTTTTATTTTCAGCATTCCTCATGCATTTCAGTGGTAACCAAAAA 3302 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3819 AAGGT TAAAAGACCAGT TT T AT TTT CAGC ATT CCT CAT GCAT TT CAGT GGTAAC CAAAAA 3878 

Qy 3303 ATAAT T T GT CAAT TAAT AGTT GT GT GC CAAGC ACT CCT AAT T TGTT TT ATT G CGT GT GT G 3362 



1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 

Db 3879 ATAATTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTG 3938 

Qy 3363 T GC AT GT GT GT ATGT GT AT C ACAGGTAATAAAGGCAATT GGAT GAT TAAAAAAAAAAAAA 3422 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 3939 T GCAT GT GT GT AT GT GT AT C ACAGGTAATAAAGGCAATT GGAT GAT AT CT GT AGGAGGAA 3998 

Qy 3423 AAAAAAAAAAAA 3434 

I I I I I II 
Db 3999 AACAAT GACT AA 4010 
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AC019335 182638 bp DNA linear HTG 07-JUL-2000 

Homo sapiens chromosome 5 clone RP11-427K3, WORKING DRAFT SEQUENCE, 
18 unordered pieces. 
AC019335 

AC019335.2 GI : 7231064 

HTG; HTGS_PHASE1; HTGS_DRAFT . 

Homo sapiens (human) 

Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 182638) 
Waterston, R. H . 

The sequence of Homo sapiens clone 
Unpublished 

2 (bases 1 to 182638) 
Waterston, R. H. 
Direct Submission 

Submitted ( 01- JAN-2000 ) Genome Sequencing Center, Washington 
University School of Medicine, 4444 Forest Park Parkway, St. Louis, 
MO 63108, USA 

On Mar 13, 2000 this sequence version replaced gi: 6652510. 



Genome Center 

Center: Washington University Genome Sequencing Center 
Center code: WUGSC 

Web site : http : //genome . wustl . edu/gsc/index . shtml 

Project Information 

Center project name: H_NH0427K03 

Summary Statistics 

Sequencing vector: M13; 87% 
Sequencing vector: plasmid; 13% 
Chemistry: Dye-primer ET; 87% of reads 
Chemistry: Dye-terminator Big Dye; 13% of reads 
Assembly program: Phrap; version 0.990319 
Consensus quality: 174376 bases at least Q40 
Consensus quality: 176799 bases at least Q30 
Consensus quality: 178323 bases at least Q20 
Insert size: 182000; agarose-fp 
Insert size: 180938; sum-of-contigs 
Quality coverage: 4.40 in Q20 bases; agarose-fp 
Quality coverage: 4.46 in Q20 bases; sum-of-contigs 



* NOTE: This is a 'working draft 1 sequence. It currently 



consists of 18 contigs . The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
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/organism= M Homo sapiens" 
/ mo l_type=" genomic DNA" 
/db_xref="taxon: 9606" 
/chromosome="5" 
/clone="RPll-427K3" 
1. .1193 

/note="assembly_name : Contig4 " 
1294. .2928 

/note="assembly_name : Contig5" 
3029. .5126 

/note="assembly_name : Contig6" 
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/ note="assembly_name : Contig7 " 
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/note="assembly_name 
59201. .72609 
/no te= " as s embly_name 
72710. .86964 
/note="assembly__name 
clone_end:T7 
vector_side : right" 
87065. .104522 
/note="assembly_name 
104623. .126360 
/note="assembly_name 
126461. .149770 
/note="assembly_name 
149871. .182638 
/note= M assembly_name 
a 34459 c 33797 g 



:Contig8" 
:Contig9" 
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: Contigl7 
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Query Match 59.0%; Score 2045.4; DB 2; Length 182638; 

Best Local Similarity 98.7%; Pred. No. 0; 

Matches 2072; Conservative 0; Mismatches 26; Indels 1; Gaps 1; 

Qy 1336 TGTCTTCTACCGTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAA 1395 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I 
Db 179790 TTTCTTTTCAGGTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAA 

179731 



Qy 1396 T GGCAAAGGT GACAAAGAAGGGGAGAGCGGC AC GT CTT GCAAT GAC CT CT C CACAT CT AG 1455 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 179730 TGGCAAAGGTGACAAAGAAGGGGAGAGCGGCACGTCTTGCAAT GACCT CT CCACATCT AG 

179671 



Qy 1456 CT GCGAC AGC C AGT CT GAGG CC AGCT CT C CC CAGGAGAC GGT CAT CT GT GGT C CC GT GAC 1515 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 17 9670 CT GCGAC AGCCAGTCT GAGGCC AGCT CTC CC CAGGAGAC GGT CAT CT GT GGT C CC GT GAC 

179611 



Qy 



1516 ACGCCAGACCAACATCCAGACTCTGGACCGTCCCATCAAGAAGGGCCCTGTCCAGCTGAT 1575 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I M I I I I I I I I 



Db 179610 AC GC CAGAC CAAC AT C C AGACT CT GGAC C GT C C CAT CAAGAAGGG C C CTGT C CAGCT GAT 

179551 



Qy 1576 CCAACAGTCAGAGATGCGGCGGAAAAGCGACTTACTCCGGATTCTGACTTCAGGCTCCAG 1635 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I II I I I 
Db 179550 C CAAC AGT C AGAGAT GC GGCGGAAAAGC GACT T AC T C C G GACT CT GACTT CAGGCT CCAG 

179491 

Qy 1636 GGAAT C GAAC AT GAGC AGCAAAAAAAAAGCT GTT AAAGAAAAGCT CT CAATT GAGGAGGA 1695 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 179490 GGAAT C GAAC AT GAGC AGCAAAAAAAAAG CT GT T AAAGAAAAGCT CT CAATT GAGGAGGA 

179431 

Qy 1696 GCT GGAGAAAT GT ATCC AGGATTT C CT AAAAAAAAAAAT T CC AGAT C G GTTT C CT GAGAG 1755 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 179430 GCT GGAGAAAT GT ATCCAGGATTT C CTAAAAAT CAAAATT CC AGAT CGGT TT CCT GAGAG 

179371 

Qy 1756 AAAACAT C CTT GG CAAT CT GAACT T TTAAG GAAGT AT CAT CT ATAAG GGAGGGCT GGGGG 1815 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 179370 AAAACAT C CT T GGCAAT CT GAACT TT T AAGGAAGT AT CAT CT AT AAGGGAGGGCT GGGGG 

179311 

Qy 1816 C GGGGAAAAAAAAAAAAAAGAGT CAT T TT GAAAT TAAC CT CATAAAAGGAATT CAT AT T T 1875 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 179310 C - GGGAAAAGAAAAAAAAAAAGTC ATT TT GAAAT TAACCT CCTAAAAGGAATT CAT AT T T 

179252 

Qy 1876 TAAAGGAAAAAAAT ACAACT AATGAT GCACAT T T CTT AGAACACAAT AGT CCATT GAT AT 1935 

I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 17 9251 TAAAGGAAAAAAAT ACAACTAAT GAT GCACAT T T CTT AGAACACAAT AGT C CATT GAT AT 

179192 

Qy 1936 ACT ACT GC CT ACT T T AC CT AGT T CAC CTTAAC AT GTAAAT CC ACAG GGT AGATTT CTT T C 1995 

I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 179191 ACT ACT GCCTACTTTACCTAGTTCACCTTAACAT GTAAAT CCACAGGGTAGATTT CTTTC 

179132 

Qy 1996 T AGAT GT GGAAGT ACAAGAAAAT CTT T TTT AGTT ATT T GT TT GTT T ACTT C GT C C CAT GT 2055 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 179131 T AGAT GT GGAAGT ACAAGAAAATCT TT TT T AGT T ATT T GT TT GTTT ACTT C GT C C CAT GT 

179072 

Qy 2056 GCTAACTAT CTTATATATAAT GAGAGCCAGCTACGTAAAAGTAGCT GAGAGGCCTTGGGA 2115 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 179071 GCTAACTAT CT T AT AT ATAAT GAGAGC CAGCT AC GTAAAAGT AGCT GAGAGGCCT T GGGA 

179012 

Qy 2116 GTCATTTATCCCAAACTGGGTTTTTTCTCTCATCCTTCTACCTCCCTCCTTTGAATGAGG 2175 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 179011 GTCATTTATCCCAAACTGGGTTTTTTCTCTCATCCTTCTACCTCCCTCCTTTGAATGAGG 

178952 

Qy 2176 GT AT GGT AGAAAAAGAT CT GGC CC AAT GGCATAAGTTT GGAAT TT T TAATTTT GGT TT T T 2235 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I 
Db 178951 GT AT GGT AGAAAAAGAT CT GGC CCAAT GGC ATAAGTTT GGAATTT T T AAT TTTGGTTTTT 

178892 



Qy 2236 CCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTTTTCACTCAAATCTAT 2295 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I II II I 

Db 178891 CCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTTTTCACTCAAATCTAT 

178832 

Qy 2296 AT GT GC C AGT T TAT AT T GACT C CGT AT GCAT GAGT ATTT GT GCAACACAAGCACAACT AA 2355 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 178831 AT GT GC C AGTT T AT AT T GACT C CGT AT GCAT GAGT AT TT GT GCAAC ACAAGC ACAACT AA 

178772 

Qy 2356 GTAT GTATATACACAT GACGCACACGATGCCAGGGCCTAGACCTCCCAAGGGCT GT GCT C 2415 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 178771 GTAT GTATATACACAT GACGCACACGATGCCAGGGCCT AGACCT CCCAAGGGCT GT GCT C 

178712 

Qy 2416 CTGCTCCCAGCAGCCCTCTCTTAGAATATTTCAGATGGATGAGCTTCTGACTCTTTCTTA 2475 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I 
Db 178711 CTGCTCCCAGCAGCCCTCTCTTAGAATATTTCAGATGGATGAGCTTCTGACTCTTTCTTA 

178652 

Qy 2476 AAATT CT TT T GGGAAGATTT C C CAG CCTT T CT T C ACAAC ACTTT CTAAC AT CAAAT GACT 2535 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 178651 AAATTCTTTTGGGAAGATTTCCCAGCCTTTCTTCACAACACTTTCTAACAT CAAAT GACT 

178592 

Qy 2536 CTCATCATCAACAAATTGTATTCCTTATTGTGAAATTAATACCCTCAGGCTCCATTTTAC 2595 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 178591 CT CAT C ATCAACAAATT GT AT T CCTT ATT GT GAAAT T AAT AC CCT C AGGCT C CATT T T AC 

178532 

Qy 2596 T GCT T T GCT CT T T GT CT GC AT TAAGAGAGGAT GAGGAGAGCT GGT CAAAC ATT C CT T GT G 2655 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 178531 T GCT T T GCT CT T TGT CT GCAT TAAGAGAGGAT GAGGAGAGCT GGT CAAAC AT TC CT T GT G 

178472 

Qy 2656 TT AAAAAAAT CAAACAT T CAT ATC CACAAAAT TT T CT GCTAAAT GACT C CACACT C AGCC 2715 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 178471 TT AAAAAAAT CAAACAT T CAT AT C C ACAAAATTT T CT GCTAAAT GACT C CACACT C AGC C 

178412 

Qy 2716 TTCTCTACCCTGAACTGAATTATCACCCTTTTCTCCATGTTTTCAGAGTTCTTACTGCCC 2775 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 178411 TTCTCTACCCTGAACTGAATTATCACCCTTTTCTCCATGTTTTCAGAGTTCTTACTGCCC 

178352 

Qy 2776 AC AGT TTAAT GGT GT GG C CT T T CC AC AT AAT C CAC ATTAAGT TCT GT GT T C CTGT GT T GT 2835 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 178351 ACAGTTTAATGGTGTGGCCTTTCCACATAATCCACATTAAGTTCTGTGTTCCTGTGTTGT 

178292 

Qy 2836 T GT G GAACTAAGGAC AACACACAGT ACTT GAAT AAGGGT CC GGC CT T TT GT T T GT TTTAG 2895 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 178291 T GTGGAACTAAGGACAACACACAGT ACTT GAAT AAGGGT C C GGCCT T TT GTT T GT TTTAG 

178232 



Qy 



2896 AGAAAGT T GT ATT C CACAC ACAAC CT AAT AAT TT C T T AT AAAAATT TTAAACT ACAAAGC 2955 



Db 178231 
178172 



Qy 

Db 

178112 



2956 TACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATTCGGGCTTTGGCTGTG 3015 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
178171 TACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATTCGGGCTTTGGCTGTG 



Qy 

Db 

178052 



3016 CCCAT GCTAGGATTT AGCT GTGTCATTTTTAT GAT GTCTGTAACAACCCAACAAGGTAAC 3075 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I 
178111 C CC AT GCT AGGAT T T AGCT GT GTC AT TT T TAT GAT GT CT GTAACAACCCAACAAGGTAAC 



Qy - 
Db 

177992 



3076 TGAAGCTCCAGAGTTAAGGTTTCAGATTTCTAAATGAAACTATCTTTTTCAATTACATCC 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
178051 T GAAGCT C CAGAGT T AAGGTTT CAGATT T CTAAAT GAAACT AT CTT T TT CAATT AC AT CC 



3135 



Qy 

Db 

177932 



3136 TGACTTGT AT AGACAC AGCCAAAAAGAAACT GT TAAT AGC CAT C CGT CCAT GT AACT CT G 
I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
177 991 T GACTT GT AT AGACAC AGC CAAAAAGAAACT GT TAAT AGC CAT C CGT CCAT GTAACT CTG 



3195 



Qy 

Db 

177872 



3196 T AT TTT ACT AAGGT AC CAAT AGCT CTT T C AT AGACTT GT GCTAC AAGAAGGT TAAAAGAC 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
177931 T ATT T TACT AAGGT AC CAAT AG CT CTT T C AT AGACT T GT GCT ACAAGAAGGT TAAAAGAC 



3255 



Qy 

Db 

177812 



3256 C AGT T TTATTT TCAGCAT T CCT CAT GCATTTC AGT GGT AAC CAAAAAATAATT T GT CAAT 
I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I 
177871 C AGT TTT ATT T T CAGC AT T CCT CAT GC AT T TC AGT GGT AAC CAAAAAATAATT T GT CAAT 



3315 



Qy 

Db 

177752 



3316 TAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTAT 
I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
177811 TAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTAT 



3375 



Qy 3376 GT GTATCACAGGTAATAAAGGCAATTGGAT GATTAAAAAAAAAAAAAAAAAAAAAAAAA 3434 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 177751 GT GT AT CACAGGTAAT AAAGGCAATTGGATGATATCT GTAGGAGGAAAAC AAT GACTAA 177693 



RESULT 3 

AC008716/C 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC008716 184589 bp DNA linear PRI 17-OCT-2001 

Homo sapiens chromosome 5 clone CTB-85P21, complete sequence. 
AC008716 

AC008716.7 GI: 16195190 
HTG. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 184589) 

DOE Joint Genome Institute and Stanford Human Genome Center. 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Direct Submission 
Unpublished 

2 (bases 1 to 184589) 

DOE Joint Genome Institute. 
Direct Submission 

Submitted ( 03-AUG-1999) Production Sequencing Facility, DOE Joint 
Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA 

3 (bases 1 to 184589) 

DOE Joint Genome Institute and Stanford Human Genome Center. 
Direct Submission 

Submitted (31-MAY-2000) DOE Joint Genome Institute, 2800 Mitchell 
Drive, Walnut Creek, CA 94598, USA 

4 (bases 1 to 184589) 

DOE Joint Genome Institute and Stanford Human Genome Center. 
Direct Submission 

Submitted ( 17-OCT-2001) DOE Joint Genome Institute, 2800 Mitchell 
Drive, Walnut Creek, CA 94598, USA 

On Oct 17, 2001 this sequence version replaced gi: 8122137. 
Draft Sequence Produced by DOE Joint Genome Institute 
www . j gi . doe . gov 

Finishing Completed at Stanford Human Genome Center 
www-shgc. Stanford. edu 

Quality: Phrap Quality >=40 99.5% of Sequence; 
Estimated Total Number of Errors is 0.9. 
STS Content: 
WI-13819 G21997. 

Location/Qualifiers 

1. .184589 

/organism="Homo sapiens" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 9606" 
/ chromosome="5" 
/clone="CTB-85P21" 
60462 a 35809 c 33674 g 54644 t 



Query Match 58.9%; Score 2043.4; DB 9; Length 184589; 

Best Local Similarity 98.7%; Pred. No. 0; 

Matches 2071; Conservative 0; Mismatches 26; Indels 2; Gaps 1; 

Qy 1336 TGT CT TCT AC C GT GAGC CT TC C AGAT GGT C ACC CT C AC ACT GC GAT T GCT G CT GCAAGAA 1395 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 179121 TT T CT TTT CAGGT GAGCCTTC C AGAT GGT CAC C CT C ACACT GCGAT T GCT GCTGCAAGAA 

179062 



Qy 1396 T GGCAAAGGT GACAAAGAAG GGGAGAGCGGC AC GT CTT GCAAT GAC CT CT C CACAT CT AG 1455 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 179061 T GGCAAAGGT GACAAAGAAGGGGAGAGC GGCAC GT CTT GCAAT GAC CT CT C CACAT CT AG 

179002 



Qy 1456 CT GCGAC AGC C AGT CT GAGGC C AGCTCT C C C C AGGAGAC GGT CAT CT GT GGT CC C GT GAC 1515 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I 
Db 179001 CTGCGACAGCCAGTCTGAGGCCAGCTCTCCCCAGGAGACGGTCATCTGTGGTCCCGTGAC 

178942 



Qy 



1516 ACGCCAGACCAACATCCAGACT CT GGACCGT CCCAT CAAGAAGGGCCCT GT CCAGCT GAT 1575 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 178941 AC GC C AGAC CAAC AT C CAGACT CT G GAC C GT CCC AT CAAGAAGGGC C CT GT CCAGCT GAT 

178882 



Qy 

Db 

178822 



1576 C CAAC AGT C AGAGAT G CGGCGGAAAAGC GACTT ACT C C GGAT T CT GACTT C AGGCT C CAG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
178881 C C AACAGT C AGAGAT G C GGCGGAAAAGC GACTT ACT C CG GACT CT GACTT C AGGCT CCAG 



1635 



Qy 

Db 

178762 



1636 GGAAT C GAAC AT GAGC AGCAAAAAAAAAGCT GTTAAAGAAAAGCT CT CAATT GAGGAGGA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
178821 G GAAT CGAAC AT GAGCAGCAAAAAAAAAGCT GTTAAAGAAAAGCT CT CAATT GAGGAGGA 



1695 



Qy 

Db 

178702 



1696 G CT GGAGAAAT GT AT CC AGGATTT CCT AAAAAAAAAAATT C CAGATCGGT TT C CTGAGAG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
178761 GCT GGAGAAAT GT AT C CAGGATTT CCT AAAAAT CAAAATT C CAGAT C GGT TT C CTGAGAG 



1755 



Qy 

Db 

178642 



1756 AAAACAT CCTT GGCAATCT GAACTTTT AAGGAAGT AT CAT CTAT AAGGGAGGGCTGGGGG 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

178701 AAAACAT CCTT GGCAATCT GAACTTTT AAGGAAGT AT CAT CTAT AAGGGAGGGCTGGGGG 



1815 



Qy 

Db 

178584 



1816 C GGGGAAAAAAAAAAAAAAGAGT CATTT TGAAAT T AAC CT C AT AAAAGGAATT CAT ATTT 1875 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I 
178641 C GGG — AAAAGAAAAAAAAAAGTCATT T TGAAATTAAC CT C CTAAAAGGAATT CAT ATTT 



Qy 

Db 

178524 



187 6 TAAAGGAAAAAAAT ACAACT AAT GAT GC ACATTT CT T AGAACACAAT AGT C CAT TGATAT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
178583 TAAAGGAAAAAAATACAACTAATGATGCACATTTCTTAGAACACAATAGTCCATTGATAT 



1935 



Qy 

Db 

178464 



1936 ACT ACT GC CT ACT TTAC CT AGT T C ACCT TAAC AT GTAAAT C C AC AGGGT AGATT T CTTT C 
I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
178523 ACTACTGCCTACTTTACCTAGTTCACCTTAACATGTAAATCCACAGGGTAGATTTCTTTC 



1995 



Qy 

Db 

178404 



1996 T AGAT GT GGAAGT ACAAGAAAAT CTTT TTTAGTT ATTT GT TT GTTT ACTT C GT C CCAT GT 2055 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
178463 TAGATGTGGAAGTACAAGAAAATCTTTTTTAGTTATTTGTTTGTTTACTTCGTCCCATGT 



Qy 

Db 

178344 



2056 GCTAACTAT CTTATATATAATGAGAGCCAGCTAC GTAAAAGTAGCTGAGAGGCCTT GGGA 2115 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
17 8403 GCTAACTAT CT TAT AT AT AAT GAGAGC C AGCT AC GT AAAAGT AGCT GAGAGGC CT T GGGA 



Qy 

Db 

178284 



2116 GTCATTTATCCCAAACTGGGTTTTTTCTCTCATCCTTCTACCTCCCTCCTTTGAATGAGG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
178343 GTCATTTATCCCAAACTGGGTTTTTTCTCTCATCCTTCTACCTCCCTCCTTTGAATGAGG 



2175 



Qy 

Db 

178224 



2176 GTATGGTAGAAAAAGATCTGGCCCAATGGCATAAGTTTGGAATTTTTAATTTTGGTTTTT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
178283 GTATGGTAGAAAAAGATCTGGCCCAATGGCATAAGTTTGGAATTTTTAATTTTGGTTTTT 



2235 



Qy 2236 CCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTTTTCACTCAAATCTAT 2295 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
Db 178223 CCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTTTTCACTCAAATCTAT 

178164 

Qy 2296 AT GT GC CAGTTT AT ATT GACT C CGT AT GCAT GAGT ATT T GT GCAAC ACAAG C ACAACTAA 2355 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 178163 AT GT GC CAGTTT AT ATT GACT C CGT AT GCAT GAGT ATTT GT GCAAC ACAAGC AC AACTAA 
178104 

Qy 2356 GTATGTATATACACATGACGCACACGATGCCAGGGCCTAGACCTCCCAAGGGCTGTGCTC 2415 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 178103 GT AT GT AT AT ACAC AT GAC GCAC AC GAT GCCAGGGCCT AGAC CT C C CAAGGGCT GT GCT C 

178044 

Qy 2416 CTG CT C CC AGCAG CC CT CT CT T AGAAT AT T T C AGATGGAT GAGCT T CT GACT CT T T CT T A 2475 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 178043 CTGCT C C C AGCAGCC CT CT CTT AGAAT AT T T C AGATGGAT GAGCT T CT GACT CT T T CT T A 

177984 

Qy 2476 AAATT CTTT TGGGAAGATT T C C CAGC CTT T CTT C ACAAC ACTT T CTAACAT CAAAT GACT 2535 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177983 AAATTCTTTTGGGAAGATTTCCCAGCCTTTCTTCACAACACTTTCTAACATCAAATGACT 

177924 

Qy 2536 CTCATCAT CAACAAATT GTATTCCTT ATT GTGAAATTAATACCCT CAGGCT CCATTTT AC 2595 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
Db 177923 CTCATCAT CAACAAATT GTATTCCTTATTGTGAAATTAATACCCT CAGGCT CCATTTT AC 

177864 

Qy 2596 T GCT TT GCT CTTT GTCT GCAT TAAGAGAGGAT GAG GAGAGCTGGTCAAACATTCCTTGTG 2655 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177863 TGCTTT GCT CTTT GTCT GCAT TAAGAGAGGAT GAGGAGAGCTGGTCAAACATTC CTT GTG 

177804 

Qy 2656 TTAAAAAAATCAAACATTCATATCCACAAAATTTT CT GCTAAATGACTCCACACT CAGCC 2715 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177803 TTAAAAAAATCAAACATTCATATCCACAAAATTTT CT GCTAAATGACTCCACACT CAGCC 

177744 

Qy 2716 TTCTCTACCCTGAACTGAATTATCACCCTTTTCTCCATGTTTTCAGAGTTCTTACTGCCC 2775 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 177743 TTCTCTACCCTGAACTGAATTATCACCCTTTTCTCCATGTTTTCAGAGTTCTTACTGCCC 

177684 

Qy 2776 ACAGTTTAATGGTGTGGCCTTTCCACATAATCCACATTAAGTTCTGTGTTCCTGTGTTGT 2835 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177683 ACAGTTTAATGGTGTGGCCTTTCCACATAATCCACATTAAGTTCTGTGTTCCTGTGTTGT 

177624 

Qy 2836 TGTGGAACTAAGGACAACACACAGTACTTGAATAAGGGTCCGGCCTTTTGTTTGTTTTAG 2895 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 177623 TGTGGAACTAAGGACAACACACAGTACTTGAATAAGGGTCCGGCCTTTTGTTTGTTTTAG 

177564 

Qy 2896 AGAAAGT T GT ATT CC AC AC ACAAC CTAATAATTTCTT ATAAAAAT TT TAAACT ACAAAGC 2955 



Db 177563 AGAAAGTT GT AT T C CAC AC ACAAC CT AAT AATT T CTT ATAAAAAT T TTAAACT ACAAAGC 

177504 

Qy 2956 TACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATTCGGGCTTTGGCTGTG 3015 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177503 TACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATTCGGGCTTTGGCTGTG 

177444 

Qy 3016 CCC AT GCT AGGAT TT AG CT GT GT C AT TT T TAT GAT GT CT GT AACAAC C CAAC AAG GTAAC 3075 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177443 CCCAT GCT AGGAT TTAGCT GT GT C AT TTTT AT GAT GT CT GT AACAAC C CAACAAGGTAAC 

177384 

Qy 3076 TGAAGCTCCAGAGTTAAGGTTT CAGATTT CTAAATGAAACTAT CTTTTTCAATTACAT CC 3135 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177383 TGAAGCTCCAGAGTTAAGGTTT CAGATTT CTAAATGAAACTAT CTT TTTCAATT AC AT CC 

177324 

Qy 3136 T GACTTGTATAGACACAGCCAAAAAGAAACTGTTAAT AGCCATCCGT CCATGTAACTCTG 3195 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 177323 T GACTTGTAT AGACACAGCCAAAAAGAAACTGTTAAT AGCCATCCGTCCAT GTAACTCT G 

177264 

Qy 3196 T AT TTT ACT AAG GT AC CAATAGCT CT T T C AT AGACTT GT GCT ACAAGAAGGT T AAAAGAC 3255 

I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177263 TAT T T T ACT AAGGT AC CAATAGCT CT T T CAT AGACTT GT GCT ACAAGAAGGTTAAAAGAC 

177204 

Qy 3256 CAGT T T TAT TT T C AGCATT C CT CAT GC ATT T C AGT GGTAAC CAAAAAATAATTT GT CAAT 3315 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177203 CAGT TTT AT TT T C AGCATT C CT CAT GC AT T T CAGT GGTAAC CAAAAAATAATTT GT CAAT 

177144 

Qy 3316 TAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTAT 3375 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177143 TAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTAT 

177084 

Qy 3376 GT GT AT CACAGGTAAT AAAGGCAATT GGAT GATTAAAAAAAAAAAAAAAAAAAAAAAAA 3434 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I II 

Db 177083 GT GT AT CACAGGTAAT AAAGGCAAT T GGAT GAT AT CT GT AGGAGGAAAACAAT GACT AA 177025 



RESULT 4 

AC008473/C 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC008473 98360 bp DNA linear PRI 03-OCT-2001 

Homo sapiens chromosome 5 clone CTC-375J15, complete sequence. 
AC008473 

AC008473.6 GI: 15887240 
HTG. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 98360) 

DOE Joint Genome Institute and Stanford Human Genome Center. 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Direct Submission 
Unpublished 

2 (bases 1 to 98360) 

DOE Joint Genome Institute. 
Direct Submission 

Submitted ( 03-AUG-1999 ) Production Sequencing Facility, DOE Joint 
Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA 

3 (bases 1 to 98360) 

DOE Joint Genome Institute and Stanford Human Genome Center. 
Direct Submission 

Submitted ( 01-SEP-2000 ) DOE Joint Genome Institute, 2800 Mitchell 
Drive, Walnut Creek, CA 94598, USA 

4 (bases 1 to 98360) 

DOE Joint Genome Institute and Stanford Human Genome Center. 
Direct Submission 

Submitted ( 03-OCT-2001 ) DOE Joint Genome Institute, 2800 Mitchell 
Drive, Walnut Creek, CA 94598, USA 

On Oct 3, 2001 this sequence version replaced gi: 9958005. 
Draft Sequence Produced by DOE Joint Genome Institute 
www. jgi . doe . gov 

Finishing Completed at Stanford Human Genome Center 
www-shgc . Stanford . edu 

Quality: Phrap Quality >=40 99.1% of Sequence; 
Estimated Total Number of Errors is 0.8. 
STS Content: 
SHGC-103102 G57424. 

Location/ Qualifiers 

1. .98360 

/organism="Homo sapiens" 
/mol_type= "genomic DNA" 
/db_xref="taxon: 9606" 
/ chromosome="5" 
/clone="CTC-375J15" 
31520 a 19442 c 18516 g 28882 t 



Query Match 58.8%; 
Best Local Similarity 98.5%; 
Matches 2068; Conservative 



Score 2038.6; 
Pred. No. 0; 
0; Mismatches 



DB 9; Length 98360; 



29; Indels 



2; Gaps 



i; 



Qy 1336 TGTCTTCTACCGTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAA 1395 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 6692 TTTCTTTTCAGGTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAA 6633 



Qy 1396 TGGCAAAGGTGACAAAGAAGGGGAGAGCGGCACGTCTTGCAATGACCTCTCCACATCTAG 1455 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 6632 TGGCAAAGGTGACAAAGAAGGGGAGAGCGGCACGTCTTGCAATGACCTCTCCACATCTAG 6573 

Qy 1456 CT GC GAC AGC C AGT CT GAGGC C AGCT CTC C CCAGGAGAC GGT CAT CT GT GGT CC C GT GAC 1515 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 6572 CTGCGACAGCCAGTCTGAGGCCAGCTCTCCCCAGGAGACGGTCATCTGTGGTCCCGTGAC 6513 

Qy 1516 AC GCC AGAC CAAC AT CC AGACT CT GGACCGT C CCAT CAAGAAGGGC C CT GT CCAGCT GAT 1575 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 6512 AC GCC AGAC CAACAT CC AGACT CT GGACCGT C CCAT CAAGAAGGGC CCT GT C CAGCT GAT 6453 



Qy 



1576 C CAAC AGT CAGAGAT GCGGCGGAAAAGCGACTTACT CCGGATTCT GACTT CAGGCT CCAG 1635 



Db 6452 CCAACAGT CAGAGAT GC GGCGGAAAAGC GACT T ACT C C GGACT CTGACT T C AGGCT CC AG 6393 

Qy 1636 GGAAT C GAAC AT GAGC AGCAAAAAAAAAGCTGTTAAAGAAAAGCT CT CAAT T GAGGAGGA 1695 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 6392 GGAAT C GAAC AT GAGC AGCAAAAAAAAAGCT GT TAAAGAAAAGCT CT CAAT T GAGGAGGA 6333 

Qy 1696 GCT GGAGAAAT GT AT C C AGGAT TT CCTAAAAAAAAAAATT C CAGAT C GGTT T CCT GAGAG 1755 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II 
Db 6332 GCT GGAGAAAT GTAT C CAG GATTT C CT AAAAAT CAAAATT C CAGAT C GGTT T C CT GAGAG 6273 

Qy 1756 AAAACATCCTTGGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGG 1815 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
Db 6272 AAAACATCCTTGGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGG 6213 

Qy 1816 C G G G G AAAAAAAAAAAAAAG AGT CAT T T T G AAAT T AAC CT C AT AAAAG GAAT T CAT AT T T 1875 

I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 6212 CGGG— AAAAGAAAAAAAAAAGTCATTTTGAAATTAACCTCCTAAAAGGAATTCATATTT 6155 

Qy 1876 T AAAGGAAAAAAAT ACAACTAAT GAT GC AC AT TTCT T AGAACACAAT AGT C C ATT GAT AT 1935 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 6154 T AAAGGAAAAAAAT ACAACTAAT GAT GCACATTTCT T AGAACACAAT AGT C C ATT GAT AT 6095 

Qy 1936 ACTACTGCCTACTTTACCTAGTTCACCTTAACATGTAAATCCACAGGGTAGATTTCTTTC 1995 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 6094 ACT ACT GC CT ACT T T ACCT AGTTC AC CTT AAC ATGT AAAT C CAC AGGGT AGATT T CTT TC 6035 

Qy 1996 T AGAT GT GGAAGT ACAAGAAAAT CTT TTT T AGTT AT TT GT T TGT TT ACT T C GTC CCAT GT 2055 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 6034 T AGAT GT GGAAGT ACAAGAAAATCTTTT TT AGTT AT TT GT T TGTTT ACT T C GTC C CAT GT 5975 

Qy 2056 GCTAACTATCTTATATATAATGAGAGCCAGCTACGTAAAAGTAGCTGAGAGGCCTTGGGA 2115 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 5974 GCTAACTATCTT ATATATAAT GAGAGCCAGCTACGTAAAAGTAGCT GAGAGGCCTTGGGA 5915 

Qy 2116 GTCATTTATCCCAAACTGGGTTTTTTCTCTCATCCTTCTACCTCCCTCCTTTGAATGAGG 2175 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 5914 GTCATTTATCCCAAACTGGGTTTTTTCTCTCATCCTTCTACCTCCCTCCTTTGAATGAGG 5855 

Qy 2176 GTAT GGTAGAAAAAGATCTGGCC CAAT GGCATAAGTTT GGAAT TTT TAATTTTGGTTTTT 2235 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 5854 GT AT GGT AGAAAAAGAT CT GGC C CAAT GGCATAAGTT T GGAAT TTT TAAT T TT GGTTT TT 5795 

Qy 2236 CCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTTTTCACTCAAATCTAT 2295 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 5794 CCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTTTTCACTCAAATCTAT 5735 

Qy 2296 AT GT GC CAGTT TAT ATT GACT C CGT AT GC AT GAGT ATTT GT GCAACACAAGC ACAACTAA 2355 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 5734 ATGT GCCAGTTT ATATTGACT CCGT AT GCATGAGTATTT GTGCAACACAAGCACAACTAA 5675 

Qy 2356 GTAT GT AT AT ACAC AT GAC GC ACAC GAT GC CAGGGC CT AGACCT C CCAAGGGCT GTGCT C 2415 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 5674 GT ATGT AT AT ACAC AT GAC GC ACAC GAT GCCAGGGC CT AGACCT C C CAAGGGCT GTGCT C 5615 

Qy 2416 CT GCT C C CAGC AGC CCT CT CTT AGAAT AT T TCAGAT GGAT GAGCTT CT GACT CT TTCT T A 2475 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 



5614 CTGCTCCCAGCAGCCCTCTCTTAGAATATTTCAGATGGATGAGCTTCTGACTCTTTCTTA 5555 



Qy 2476 AAAT T CTTT T GGGAAGAT T T C C C AGCCT T T CTT CACAAC ACTTT CT AACAT CAAAT GACT 2535 

I I II I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I II I I I I I I I I I 
Db 5554 AAATTCTTTTGGGAAGATTTCCCAGCCTTTCTTCACAACACTTTCTAACATCAAATGACT 54 95 

Qy 2536 CTCATCATCAACAAATTGTATTCCTTATTGTGAAATTAATACCCTCAGGCTCCATTTTAC 2595 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 5494 CTCATCATCAACAAATTGTATTCCTTATTGTGAAATTAATACCCTCAGGCTCCATTTTAC 5435 

Qy 2596 TGCTTTGCTCTTT GT CT G CAT TAAGAGAGGAT GAGGAGAGCT GGT CAAAC ATT C CT T GT G 2655 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 5434 T GCT T T GCT CTT T GT CT G CAT TAAGAGAGGAT GAGGAGAGCT GGT CAAACATT C CT T GT G 5375 

Qy 2656 T TAAAAAAAT CAAAC ATT CAT AT C C ACAAAAT T T T CT GCTAAAT GACT CC ACACTCAGC C 2715 

I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 5374 TTAAAAAAAT CAAACATTCATATCCACAAAATTTTCT GCTAAATGACT CCACACTCAGCC 5315 

Qy 2716 TTCTCTACCCTGAACTGAATTATCACCCTTTTCTCCATGTTTTCAGAGTTCTTACTGCCC 2775 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I 
Db 5314 TTCTCTACCCTGAATTGAATTATCACCCTTTTCTCCATGTTTTCAGAGTTCTTACTGCCC 5255 

Qy 2776 ACAGTTTAATGGTGTGGCCTTTCCACATAATCCACATTAAGTTCTGTGTTCCTGTGTTGT 2835 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 5254 ACAGTTTAATGGTGTGGCCTTTCCACATAATCCACATTAAGTTCTGTGTTCCTGTGTTGT 5195 

Qy 2836 TGT GGAACTAAGGACAACACACAGT ACTT GAATAAGGGT CCGGCCTTTTGTTT GTTTTAG 2895 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 5194 T GT GGAACTAAGGACAACACACAGT ACT T GAATAAGGGT C C GGCCTTTTGT TT GTTTTAG 5135 

Qy 2896 AGAAAGT T GT AT T CC ACACACAAC CTAATAATT T CT T ATAAAAATTTT AAACT ACAAAGC 2955 

I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I II I II I II I I I 
Db 5134 AGAAAGTT GT AT T CC ACACACAAC CTAATAAT T T CT TAT AAAAAT TT T AAACT ACAAAGC 5075 

Qy 2956 TACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATTCGGGCTTTGGCTGTG 3015 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 5074 TACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATTCGGGCTTTGGCTGTG 5015 

Qy 3016 CCCAT GCTAGGATTTAGCT GT GTCATTTTTATGAT GT CTGTAACAACCCAACAAGGTAAC 3075 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 5014 C C CAT GCT AGGAT TT AGCT GT GTC ATT TT T AT GAT GT CT GTAACAAC C CAACAAGGTAAC 4955 

Qy 307 6 T GAAG CT C C AG AGT T AAG GT T T C AG AT T T C T AAAT GAAACT AT CT T T T T C AAT T AC AT C C 3135 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4954 TGAAGCTCCAGAGTTAAGGTTTCAGATTTCTAAATGAAACTATCTTTTTCAATTACATCC 4895 

Qy 3136 T GACTT GT AT AGAC AC AGC CAAAAAGAAACT GT T AAT AGC C AT C C GT C CAT GT AACTCT G 3195 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4894 T GACT T GT AT AGACAC AG C C AGAAAGAAACT GTTAAT AGC C AT C CGT CCAT GT AACT CTG 4835 

Qy 3196 TATTTTACTAAGGTACCAATAGCT CTTTCATAGACTT GTGCTACAAGAAGGTTAAAAGAC 3255 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4834 T ATT T T ACT AAGAT ACCAAT AGCT CTT T C AT AGACT T GT GCT ACAAGAAGGTT AAAAGAC 4775 

Qy 3256 C AGT T TTATT TT C AGCAT T CCT CAT GC ATTT C AGT GGT AAC CAAAAAAT AATT T GT CAAT 3315 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 4774 CAGTTTTATTTT CAGCATT CCTCAT GCATTT CAGTGGTAAC CAAAAAAT AATTT GTCAAT 4715 



Qy 3316 TAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTAT 3375 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4714 TAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTAT 4655 

Qy 3376 GT GT AT C AC AG GT AAT AAAG GCAATT G GAT GATT AAAAAAAAAAAAAAAAAAAAAAAAA 3434 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II 

Db 4654 GT GT AT C AC AGGT AATAAAG GCAAT T GGAT GAT AT CT GT AGGAGGAAAACAAT GACT AA 4596 



RESULT 5 
AX405760 

LOCUS AX405760 2412 bp DNA linear PAT 14-JUN-2002 

DEFINITION Sequence 175 from Patent WO0222660. 
ACCESSION AX405760 

VERSION AX405760.1 GI:21438959 

KEYWORDS 

SOURCE Homo sapiens (human) 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
REFERENCE 1 

AUTHORS Tang,Y.T., Liu,C, Zhou, P., Asundi,V., Zhang, J., Zhao, Q. A., Ren,F., 

Xue,A.J., Yang,Y., Wehrman,T. and Drmanac,R.T. 
TITLE Novel nucleic acids and polypeptides 

JOURNAL Patent: WO 0222660-A 175 21-MAR-2002; 
HYSEQ, INC. (US) 
FEATURES Location/Qualifiers 
source 1. .2412 

/organism="Homo sapiens" 
/mol_type="genomic DNA" 
/db_xref="taxon: 9606" 
CDS 1092. .2378 

/note=" unnamed protein product" 
/ codon_start=l 
/protein_id="CAD348 67 .1" 
/db_xref="GI: 21438960" 

/ trans la tion="MALSGNCSRYYPREQGSAVPNSFPEWELNVGGQVYFTRHSTLI 
SIPHSLLWKMFSPKRDTANDLAKDSKGRFFIDRDGFLFRYILDYLRDRQWLPDHFPE 
KGRLKREAEYFQLPDLVKLLTPDEIKQSPDEFCHSDFEDASQGSDTRICPPSSLLPAD 
RKWGFITVGYRGSCTLGREGQADAKFRRVPRILVCGRISLAKEVFGETLNESRDPDRA 
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EELEKCIQDFLKIKIPDRFPERKHPWQSELLRKYHL" 

BASE COUNT 638 a 585 c 551 g 638 t 

ORIGIN 

Query Match k 47.3%; Score 1640.8; DB 6; Length 2412; 

Best Local Similarity 99.6%; Pred. No. 0; 

Matches 1645; Conservative 0; Mismatches 7; Indels 0; Gaps 0; 



Qy 183 AC CAAT AC GGAC AT CT GAGT AACT GGGGAATT GGCCT GC CT T GCAT GT GAGCT T GAT GGA 242 

II II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I 
Db 760 ACTAAGAC GGAC AT CT GAGT AACT GGGGAAT T GGCCT GC CT TGCAT GT GAGCTT GAT GGA 819 

Qy 243 AGATT GGATAT AGACGAGTT GATT ATATTTTATGAAGT AGCAGCT CACTACCATCCACCA 302 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 820 AGATT GGAT AT AGAC GAGT T GATT ATATTTT ATGAAGT AGC AGCT CACT AC CAT C CACC A 879 

Qy 303 TCCAGGGTTTAAACTACTTTTTCAGCATCACTTCACCTGTGGACTCTTATACATTTTGAT 362 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 880 TCCAGGGTTTAAACTACTTTTTCAGCATCACTTCACCTGTGGACTCTTATACATTTTGAT 939 

Qy 363 TT CT T GGGGGAAAAAT ACT GGGAT AAGAGGAGGT C AT TT T T T AATAAGT T AGC AT C CTT T 422 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 940 TT CT T GGGGGAAAAAT ACT GGGAT AAGAGGAGGT C ATTT T TT AATAAGT T AGCAT CCTT T 999 

Qy 423 TCCCTTTCTTACAAGTTGATCCAAAGGATAAGGCTGTGACTCCATTGGATTGCACCTTTA 482 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 1000 TCCCTTTCTTACAAGTTGATCCAAAGGATAAGGCTGTGACTCCATTGGATTGCACCTTTA 1059 

Qy 4 83 AATCAAAAT AGCAGCAGCAGAAGAAAGGGACAATGGCTCT GAGTGGAAACT GTAGTCGTT 542 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 1060 AAT CAAAAT AGC AGC AGCAGAAGAAAGGGACAAT GGCT CT GAGT GGAAACT GT AGT C GT T 1119 

Qy 543 AT T AT CCT C GAGAACAAGGGT C CGCAGTT CCCAACT C CTT CC CT GAGGT GGT AGAGCT GA 602 

I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1120 ATTAT C CT C GAGAACAAGGGT C C GCAGTT C C CAACT CCT T C C CT GAGGT GGT AGAGCT GA 1179 

Qy 603 ATGTCGGGGGTCAAGTTTATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCC 662 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 1180 ATGTCGGGGGTCAAGTTTATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCC 1239 

Qy 663 T C CT GT GGAAAAT GTTTT C CCCAAAGAGAGACAC GGCT AAT GAT CT AGC CAAGGACT C C A 722 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1240 T CCT GT GGAAAAT GT TTT CC CCAAAGAGAGACAC GGCTAAT GAT CT AGCCAAGGACT CCA 1299 

Qy 723 AGGGAAGGTTTTTCATTGACAGAGATGGATT CTT GTTCC GTTAT ATT CTGGACTAT CTCA 782 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1300 AGGGAAGGTTTTTCATTGACAGAGATGGATT CTT GTTCCGTTATATT CTGGACTAT CTCA 1359 

Qy 783 GGGACAGG C AGGT GGT CCT GCCT GAT CACTTT CC AGAAAAAGGAAGACT GAAAAGGGAAG 842 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1360 GGGACAGGC AGGT GGTCCT G CCT GAT CACTT T CC AGAAAAAGGAAGACT GAAAAGGGAAG 1419 

Qy 843 CT GAAT ACT T C C AGCTCC CAGACT T GGTCAAACT C CT GAC C C CC GAT GAAAT CAAGCAAA 902 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1420 CT GAAT ACTT C CAG CTCC CAGACT T GGT CAAACT C CT GACC C CC GAT GAAAT CAAGCAAA 147 9 

Qy 903 GC CCAGAT GAATT CT GCC ACAGT GACT TT GAAGAT GC CT C C CAAGGAAGC GACACAAGAA 962 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 14 80 GCCCAGAT GAATT CT GCC ACAGT GACTTT GAAGAT GCCT CC CAAGGAAGC GACACAAGAA 1539 

Qy 963 TCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTT 1022 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1540 TCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTT 1599 

Qy 1023 ACAGAGGAT C CT GCACCTT GGGCAGAGAGGGACAGGCAGAT GCCAAGTTT CGGAGAGTT C 1082 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1600 ACAGAGGATCCT GCACCTTGGGCAGAGAGGGACAGGCAGAT GCCAAGTTT CGGAGAGTT C 1659 

Qy 1083 CCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTTGA 1142 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1660 CCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTTGA 1719 

1143 AT GAAAGC AGAGAC C CT GAT C GAGCC CC AGAAA.GAT AC AC CT C CAGATTTT AT CT CAAAT 1202 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I 
1720 AT GAAAGC AGAGAC C CT GAT C GAGC C C CAGAAAGATACAC CTC CAGATTTT ATCT CAAAT 1779 

1203 TCAAGCACCTGGAAAGGGCTTTTGATATGTTGTCAGAGTGTGGATTCCACATGGTGGCCT 1262 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1780 T CAAGCAC CT GGAAAGGGCTT T T GAT AT GT TGT CAGAGT GT GGATT C C AC AT GGT GGC CT 1839 

1263 GTAACTC AT C GGT GACAGC AT CTT T CAT CAAC CAAT AT AC AGAT GACAAGAT CT GGT CAA 1322 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
18 40 GTAACT C AT C GGT GACAGC AT CTT T CAT CAACCAAT AT AC AGAT GACAAGAT CT GGT CAA 1899 

1323 GCTACACTGAATATGTCTTCTACCGTGAGCCTTCCAGATGGTCACCCTCACACTGCGATT 1382 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1900 GCTACACTGAATATGTCTTCTACCGTGAGCCTTCCAGATGGTCACCCTCACACTGCGATT 1959 

1383 GCTGCTGCAAGAATGGCAAAGGT GACAAAGAAGGGGAGAGCGGCAC GT CTT GCAATGACC 1442 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1960 GCT GCT GCAAGAAT GGCAAAGGTGAC AAAGAAGGGGAGAG C GGC AC GT CT T GCAAT GACC 2019 

1443 TCTCCACATCTAGCTGCGACAGCCAGTCTGAGGCCAGCTCTCCCCAGGAGACGGTCATCT 1502 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2020 TCTCCACATCTAGCTGCGACAGCCAGTCTGAGGCCAGCTCTCCCCAGGAGACGGTCATCT 2079 

1503 GT GGT C CC GT GAC AC GC CAGACCAACAT C CAGACT CT GGACC GT CC C AT CAAGAAGGGCC 1562 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2080 GT GGT C CC GT GACAC GC C AGAC CAACAT C CAGACT CT GGAC C GT CC C AT CAAGAAGGGCC 2139 

1563 CT GT CCAGCT GAT CCAAC AGT C AGAGAT GC GGC GGAAAAGC GACTT ACT C C GGAT T CT GA 1622 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2140 CT GT C CAGCT GAT CCAAC AGT C AGAGAT GC GGC GGAAAAG C GACTT ACT C C GGACT CT GA 2199 

1623 CTT C AGGCT C CAG GGAAT C GAACAT GAGC AGCAAAAAAAAAGCT GT T AAAGAAAAGCT CT 1682 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2200 CT T CAGGCT C C AGGGAAT C GAACAT GAGCAGCAAAAAAAAAGCT GT TAAAGAAAAGCT CT 2259 

1683 CAATT GAGGAGGAGCT GGAGAAAT GT AT C C AGGATT T CCTAAAAAAAAAAATT C CAGAT C 1742 

I I II I I I II I I I I I I I I I I I I I I I I I II I I I I I I I II I II I I II I I II II I II II I I I 
2260 CAATT GAGGAGGAGCTGGAGAAAT GTATCCAGGATTT CCTAAAAATCAAAATTCCAGATC 2319 

1743 GGTTTCCTGAGAGAAAACATCCTTGGCAAT CT GAACTTTTAAGGAAGTAT CAT CTATAAG 1802 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

2320 GGTTTCCTGAGAGAAAACATC CTT GGCAATCT GAACTTTTAAGGAAGTAT CAT CTATAAG 2379 

1803 GGAGGGCT GGGGGCGGGGAAAAAAAAAAAAAA 1834 

I II I I I I II II I I I I I I III I I I I I I I I II 
2380 GGAGGGCT GGGGGCGGGAAAAGAAAAAAAAAA 2411 
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FEATURES 

source 



BASE COUNT 
ORIGIN 



HTG. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

1 (bases 1 to 171949) 

DOE Joint Genome Institute and Stanford Human Genome Center. 

Direct Submission 

Unpublished 

2 (bases 1 to 171949) 

DOE Joint Genome Institute. 
Direct Submission 

Submitted ( 03-AUG-1999) Production Sequencing Facility, DOE Joint 
Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA 94598, USA 

3 (bases 1 to 171949) 

DOE Joint Genome Institute and Stanford Human Genome Center. 
Direct Submission 

Submitted ( 31- JUL-2001 ) DOE Joint Genome Institute, 2800 Mitchell 
Drive, Walnut Creek, CA 94598, USA 

On Jul 31, 2001 this sequence version replaced gi: 9256021. 
Draft Sequence Produced by DOE Joint Genome Institute 
www. jgi . doe . gov 

Finishing Completed at Stanford Human Genome Center 
www-shgc . Stanford . edu 

Quality: Phrap Quality >=40 99.9% of Sequence; 
Estimated Total Number of Errors is 0.2. 

Location/Qualifiers 

1. .171949 

/organism="Homo sapiens" 
/mol_type=" genomic DNA" 
/db_xref="taxon:9606" 
/ chromosome="5" 
/clone="CTB-18Fl M . 
54666 a 33484 c 31658 g 52141 t 



Query Match 33.5%; Score 1161.8; DB 9; Length 171949; 

Best Local Similarity 99.8%; Pred. No. 1.4e-240; 

Matches 1163; Conservative 0; Mismatches 2; Indels 0; Gaps 0; 

Qy 187 ATACGGACATCTGAGTAACTGGGGAATT GGCCT GCCTT GCATGT GAGCTTGATGGAAGAT 246 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 171266 AGACGGAC AT CT GAGTAACTGGGGAAT T GGCCT GC CTT GCAT GT GAGCTT GAT GGAAGAT 

171207 



Qy 247 TGGAT AT AGAC GAGTT GATTAT AT T TT AT GAAGTAGCAGCT CACT AC CAT CCAC CAT C C A 306 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 171206 T GGAT AT AGAC GAGTT GATTAT AT T TT AT GAAGT AGC AGCT CACTACCAT CCAC CAT C CA 

171147 

Qy 307 GGGTT TAAACT ACTTT T T C AGC AT C ACTT C AC CT GT GGACT CTT AT ACATT T T GATTT CT 366 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 171146 GGGTT TAAACT ACT TT T T CAGC AT C ACT T C ACCT GT GGACT CTT AT ACATT T T GATTT CT 

171087 



Qy 



367 TGGGGGAAAAATACTGGGATAAGAGGAGGTCATTTTTTAATAAGTTAGCATCCTTTTCCC 42 6 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 171086 TGGGGGAAAAATACTGGGATAAGAGGAGGTCATTTTTTAATAAGTTAGCATCCTTTTCCC 

171027 

Qy 427 TTT CTTACAAGTT GATCCAAAGGATAAGGCTGT GACT CCATTGGATT GCACCTTTAAATC 486 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 171026 TTT CTTACAAGTTGAT CCAAAGGATAAGGCTGTGACT CCATTGGATT GCACCTTTAAAT C 

170967 

Qy 487 AAAAT AGCAGC AGC AGAAGAAAGGGACAAT GGCT CT GAGT GGAAACT GT AGT CGT T AT T A 546 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I 
Db 170966 AAAAT AGCAGCAGCAGAAGAAAGGGACAAT GGCT CT GAGT GGAAACT GT AGT CGTT ATTA 

170907 

Qy 547 TCCTCGAGAACAAGGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGT 606 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 170906 TCCTCGAGAACAAGGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGT 

170847 

Qy 607 CGGGGGTCAAGTTTATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCCTCCT 666 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 170846 CGGGGGTCAAGTTTATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCCTCCT 

170787 

Qy 667 GT GGAAAATGTTTT C C C CAAAGAGAGACACGGCT AAT GAT CTAGCCAAGGACT CCAAGGG 726 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 170786 GT GGAAAAT GTTTT C C C CAAAGAGAGACACGGCTAAT GAT CTAGCCAAGGACT C CAAGGG 

170727 

Qy 727 AAGGTTT TT C ATT GACAGAGAT GGATT CTT GT T CCGT TAT AT T CTGGACT AT CT C AGGGA 786 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 170726 AAGGTTTTTCATT GACAGAGAT GGATT CTT GTT CCGTT AT ATT CTGGACT AT CTC AGGGA 

170667 

Qy 787 CAGG C AGGT G GT C CT GC CT GAT CACT T T C C AGAAAAAGGAAGACTGAAAAGGGAAGCT GA 84 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 170666 CAGGCAGGT GGTCCTGCCT GAT CACTTT C C AGAAAAAGGAAGACT GAAAAGGGAAGCT GA 

170607 

Qy 847 AT ACT T C C AGCT CCCAGACTT GGT CAAACT CCT GACCC C C GAT GAAAT CAAGCAAAGC C C 906 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 170606 AT ACT T C C AGCT C C C AGACTT GGT CAAACT C CT GACCC C C GAT GAAAT CAAGCAAAG CCC 

170547 

Qy 907 AGAT GAATT CT GC CACAGT GACTT T GAAGAT GC CT C CCAAGGAAGC GAC ACAAGAAT CTG 966 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 17 0546 AGAT GAAT T CTGC CACAGT GACTTTGAAGAT GCCTCCCAAGGAAGCGACACAAGAAT CTG 

170487 

Qy 967 CCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTTACAG 1026 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 170486 CCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTTACAG 

170427 

Qy 1027 AGGAT CCT GC ACCT T GGGCAGAGAGGGACAGGC AGATGC CAAGTTT C GGAGAGTT CC C CG 1086 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 170426 AGGAT C CT GC ACCT T GGGCAGAGAGGGACAGGCAGAT GCC AAGTTT C GGAGAGTT CC C CG 

170367 



Qy 1087 GATTT T GGT TT GT GGAAGGAT T TCCT T GGCAAAAGAAGT CT TT GGAGAAACTT T GAAT GA 1146 

I I I I I I I I I I I I I I I II I I I I I 1 I I I I I I I I I II I I I I I I I I I I I I I I I I I I I M I I I I I 
Db 170366 GATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTTGAATGA 

170307 

Qy 1147 AAGC AGAGAC CCT GATCGAGCCCC AGAAAGAT AC AC CT C C AGAT TT T AT CT CAAATT CAA 1206 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 170306 AAGCAGAGAC CCT GAT CGAGC C CC AGAAAGAT AC AC CT C C AGAT TT TAT CT CAAATT CAA 

170247 

Qy 1207 GCACCT GGAAAGGGCTTTT GAT AT GTT GT CAGAGT GT GGAT T C CACAT GGT GGCC T GT AA 1266 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 170246 GCACCT GGAAAGGGCTTTTGATAT GTT GT CAGAGT GT GGAT TC CACAT GGT GGCCT GTAA 

170187 

Qy 1267 CT CATCGGTGACAGCATCTTT CAT CAACCAATAT ACAGATGACAAGAT CT GGT CAAGCTA 1326 

I I II I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 170186 CTCAT CGGTGACAGCATCTTT CAT CAACCAATAT ACAGATGACAAGAT CTGGTCAAGCTA 

170127 

Qy 1327 C ACT GAAT AT GT CTT CT AC C GT GAG 1351 

I I I I I I I I I I I I I I I I I I I I I I II 
Db 170126 C ACT GAAT AT GT CTT CT AC C GTAAG 170102 



RESULT 7 
AC008383 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



AC008383 209114 bp DNA linear PRI 01-MAY-2001 

Homo sapiens chromosome 5 clone CTC-222022, complete sequence. 
AC008383 

AC008383.8 GI: 13899395 
HTG. 

Homo sapiens (human) 
Homo sapiens 
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1 (bases 1 to 209114) 

DOE Joint Genome Institute and Stanford Human Genome Center. 
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Unpublished 
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www-shgc. Stanford. edu 

Quality: Phrap Quality >=40 99.5% of Sequence; 



FEATURES 

source 
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Estimated Total Number of Errors is 0.8. 
Location/Qualifiers 
1. .209114 

/organism="Homo sapiens" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 9606" 
/ chr omo s ome= " 5 " 
/clone="CTC-222022" 
65378 a 39329 c 40172 g 64235 t 



Query Match 33.5%; Score 1161.8; DB 9; Length 209114; 

Best Local Similarity 99.8%; Pred. No. 1.4e-240; 

Matches 1163; Conservative 0; Mismatches 2; Indels 0; Gaps 0; 



Qy 187 ATACGGACATCTGAGTAACTGGGGAATTGGCCTGCCTTGCATGTGAGCTTGATGGAAGAT 246 

I I II I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 168231 AGACGGACATCTGAGTAACTGGGGAATTGGCCTGCCTTGCATGTGAGCTTGATGGAAGAT 

168290 



Qy 247 T GGATATAGACGAGTTGATTAT ATTTTAT GAAGTAGCAGCT CACTACCATCCACCATCCA 306 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 168291 T GGAT ATAGACGAGTT GATT AT ATTTTAT GAAGTAGCAGCT CACTACCATCCACCATCCA 

168350 



Qy 307 GGGTTTAAACTACTTTTTCAGCATCACTTCACCTGTGGACTCTTATACATTTTGATTTCT 366 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 168351 GGGTT TAAACT ACTTT T T CAGCAT C ACT T CAC CT GTGGACT CT T AT AC ATTTT GATTT CT 

168410 



Qy 367 TGGGGGAAAAATACTGGGATAAGAGGAGGTCATTTTTTAATAAGTTAGCATCCTTTTCCC 426 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 168411 TGGGGGAAAAATACTGGGATAAGAGGAGGTCATTTTTTAATAAGTTAGCATCCTTTTCCC 

168470 



Qy 427 T TT CTT ACAAGT T GAT C CAAAGGAT AAGGCT GT GACT C CATT GGAT T GC ACCT T TAAAT C 486 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 168471 T TT CTT ACAAGTT GAT C CAAAGGAT AAGGCTGT GACT CCATT GGAT T GCACCT T TAAAT C 

168530 



Qy 487 AAAATAGCAGCAGCAGAAGAAAGGGACAAT GGCTCTGAGT GGAAACT GTAGTCGTTATT A 54 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 168531 AAAATAGCAGCAGCAGAAGAAAGGGACAAT GGCTCTGAGT GGAAACT GTAGTCGTTATTA 

168590 



Qy 547 TCCTCGAGAACAAGGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGT 606 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 168591 TCCTCGAGAACAAGGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGT 

168650 



Qy 607 CGGGGGTCAAGTTTATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCCTCCT 666 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 168651 C GGG GGT CAAGT TT ATT TT ACT CGC CATT C CAC ATT GATAAGC AT C CCTCATT C C CT C CT 

168710 



Qy 



667 



GT GGAAAAT GTT T T C C C CAAAGAGAGACAC GGCTAAT GAT CT AGC CAAGGACT CCAAGGG 726 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 168711 GTGGAAAAT GTTTTCCCCAAAGAGAGACACGGCTAAT GAT CTAGCCAAGGACT CCAAGGG 

168770 

Qy 727 AAGGTTTT T CAT T GACAGAGAT GG ATT CTT GTT CC GT TAT ATT CT GGACT AT CT C AGGGA 786 

I I I I I I II I I I I I I I I I II I I I t I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 168771 AAGGTT TT T CATT GACAGAGAT GGATT CTT GTT CC GT TAT AT T CT GGACT AT CT C AGGGA 

168830 

Qy 787 CAGG CAGGT GGT C CT GC CT GAT CACT TT C CAGAAAAAG GAAGACT GAAAAGGGAAGCT GA 846 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 168831 CAGGCAGGTGGTCCTGCCT GATCACTTT CCAGAAAAAGGAAGACT GAAAAGGGAAGCT GA 

168890 

Qy 847 AT AC T T C C AG C T C C C A G AC T T G G T C AAA C T C C T G AC C C C C GAT G AAAT C AAG C AAA G C C C 906 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 168891 AT ACT T C CAGCT C CCAGACTT GGT CAAACT CCT GACC C C C GAT GAAAT CAAGCAAAGC C C 

168950 

Qy 907 AGATGAATT CT GCCACAGTGACTTT GAAGATGCCTCCCAAGGAAGCGACACAAGAATCT G 966 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 168951 AGATGAATT CTGCCACAGTGACTTT GAAGATGCCT CCCAAGGAAGCGACACAAGAATCT G 

169010 

Qy 967 CCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTTACAG 102 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 169011 CCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTTACAG 

169070 

Qy 1027 AGGAT C CT GCAC CTT GGGC AGAGAGGGACAGGCAGAT GCCAAGTT T CGGAGAGT T C CC C G 108 6 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I II I I 
Db 169071 AGGATCCTGCACCTTGGGCAGAGAGGGACAGGCAGATGCCAAGTTTCGGAGAGTTCCCCG 

169130 

Qy 1087 GAT T TT GGT TT GT GGAAGGAT T T C CT T GGCAAAAGAAGT CTTT GGAGAAACTTT GAAT GA 1146 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 169131 GATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTTGAATGA 

169190 

Qy 1147 AAGCAGAGACCCT GAT CGAGCCCCAGAAAGATACACCTCCAGATTTT ATCTCAAATTCAA 1206 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 169191 AAGCAGAGACC CT GAT CGAGCCCC AGAAAGAT AC AC CT CC AGATT T T ATCT CAAATTCAA 

169250 

Qy 1207 GCACCTGGAAAGGGCTTTTGATATGTTGTCAGAGTGT GGATT CCACAT GGT GGCCTGTAA 1266 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 169251 GCAC CT GGAAAGGGCT T TT GAT AT GTT GT C AGAGT GT GGATT C C ACAT GGT GGC CT GTAA 

169310 

Qy 1267 CT CAT C GGT GAC AGCAT CT T T CAT CAAC CAAT AT AC AGAT GACAAGAT CT GGT CAAGCT A 1326 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 169311 CT CATCGGT GACAGCAT CTTT CAT CAAC CAAT AT AC AGAT GACAAGAT CTGGT CAAGCT A 

169370 

Qy 1327 CACT GAAT AT GT CT T CT ACC GT GAG 1351 

I I I I I I I I I I I I I I I I I I I I I I II 
Db 169371 CACT GAAT ATGT CT T CT ACC GTAAG 169395 
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AC127249/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



AC127249 135132 bp DNA linear HTG 12-JUN-2003 

Mus musculus chromosome UNK clone RP24-475B8, WORKING DRAFT 
SEQUENCE, 4 unordered pieces. 
AC127249 

AC127249.3 GI: 31621481 

HTG; HTGS_PHASE1; HTGS_DRAFT; HTGS_ACTIVEFIN . 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 

1 (bases 1 to 135132) 
Wilson, R. K. 

The sequence of Mus musculus clone 
Unpublished 

2 (bases 1 to 135132) 
McPherson, J. D. and Waterston, R. H . 
Direct Submission 

Submitted ( 14- JUL-2002 ) Genome Sequencing Center, 4444 Forest Park 
Parkway, St. Louis, MO 63108, USA 

3 (bases 1 to 135132) 
Wilson, R.K. 

Direct Submission 

Submitted ( 12- JUN-2003 ) Genome Sequencing Center, 4444 Forest Park 
Parkway, St. Louis, MO 63108, USA 

On Jun 12, 2003 this sequence version replaced gi: 21886968. 



Genome Center 

Center: Washington University Genome Sequencing Center 

Center code: WUGSC 

Web site : http : //genome .wustl . edu 

Contact : submissions@watson . wustl . edu 

Project Information 

Center project name: M_BB0475B08 



Summary Statistics 

Sequencing vector: M13; 0% 
Sequencing vector: plasmid; 100% 
Chemistry: Dye-primer ET; 0% of reads 
Chemistry: Dye-terminator Big Dye; 100% of reads 
Assembly program: Phrap; version 0.990319 
Consensus quality: 133026 bases at least Q40 
Consensus quality: 133293 bases at least Q30 
Consensus quality: 133464 bases at least Q20 

* NOTE: This is a 1 working draft 1 sequence. It currently 

* consists of 4 contigs . The true order of the pieces 

* is not known and their order in this sequence record is 

* arbitrary. Gaps between the contigs are represented as 

* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 

* be preserved. 

* 1 8786: contig of 8786 bp in length 

* 8787 8886: gap of unknown length 



of 14695 bp in length 
unknown length 
of 19935 bp in length 
unknown length 



FEATURES 

source 



misc feature 



misc feature 



misc feature 



misc feature 



BASE COUNT 
ORIGIN 



8887 23581: contig 
23582 23681: gap of 
23682 43616: contig 
43617 43716: gap of 
43717 135132: contig 
Location/Qualifiers 
1. .135132 

/organism="Mus musculus" 
/mol__type="genomic DNA" 
/db_xref="taxon: 10090" 
/ ch r omo s ome= " UNK " 
/clone="RP24-475B8" 
1. .8786 

/note="assembly_name :Contig21" 
8887. .23581 

/note="assembly_name : Contig22" 
23682. .43616 

/note= ,f assembly_name:Contig23 ,, 
43717. .135132 
/note="assembly_name : Contig24" 
44115 a 26731 c 25600 g 38364 t 



of 91416 bp in length. 



322 others 



Query Match 24.6%; Score 853.2; DB 2; Length 135132; 

Best Local Similarity 70.7%; Pred. No. 7.5e-174; 

Matches 1473; Conservative 0; Mismatches 523; Indels 86; Gaps 22; 

Qy 1347 GTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAATGGCAAAGGTG 1406 

I I I I I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I II I 
Db 124743 GTGAGCCTTCCCGGTGGTCCTCCTCTCATTGTGACTGCTGCTGCAAGAATGGCAAGGGAG 

124684 



Qy 1407 ACAAAGAAGG GGAGAGC GGC AC GT CT T GCAAT GAC CT CT CCAC AT CT AGCT GCGACAGC C 1466 

III I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I II II I I I I I I I I I I 

Db 124683 ACA AAGGAGAGAGCGGCAC CT C CT GCAAT GAC CT GT C CACT T CC AGCT GT GAC AGC C 

124627 



Qy 14 67 AGT CT GAGGC CAGCT CT C CC C AGGAGACGGT C AT CT GTGGT C C C GT GAC ACGCC AGACC A 1526 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II II II I I I I I I I II 
Db 124626 AGTCAGAGGCCAGCTCTCCGCAGGAGACGGTGATCTGTGGGCCTGTAACGCGCCAGAGCA 

124567 



Qy 1527 ACAT C CAGACT CT GGAC C GT CCCAT CAAGAAGGGCC CTGT C CAGCT GAT CCAACAGT CAG 1586 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I II II II I I I I II I I I I I I I I I I I I I 
Db 124566 ACAT C CAGACT CT GGAT C GGC C CAT CAAGAAAGGTCCGGTG CAGCT GAT CCAACAGT CAG 

124507 



Qy 1587 AGAT GC GGCGGAAAAGCGACT T ACT C CGGAT T CT GACTT CAGGCT C C AGGGAAT C GAACA 1646 

I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 124506 AGAT GAGGC GGAAAAGT GAC CT GCT C C GGACT CT GAC GT CAGGCT C CAGGGAGT C GAACA 

124447 



Qy 1647 T GAGC AGC AAAAAAAAAGCT GTT AAAGAAAAGCT CT CAAT T GAGGAG GAGCT GGAGAAAT 1706 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II II I I I I I I I I I I I I I I I I II 
Db 124446 TAAGC AGCAAAAAGAAAGCT GC GAAGGAAAAGCT CT C CAT C GAGGAAGAGCT GGAGAAAT 

124387 



Qy 1707 GTAT CCAGGATTT CCTAAAAAAAAAAATTCCAGAT CGGTTTCCTGAGAGAAAACAT CCTT 1766 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 124386 GT AT CC AGGAT T T CTT GAAGATAAAAATT C CAGAT CGCTT C C CT GAGC GAAAAC AT CCTT 

124327 

Qy 1767 GGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGGCGGGGAAAAAA 1826 

I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I II III 

Db 124326 GGCAGTCTGAACTTTTACGGAAGTATCATCTATAGGGGGAGGGCTGTGG 

124278 

Qy 1827 AAAAAAAAGAGT CATTTTGAAATTAACCT CATAAAAGGAATTCAT ATTTTAAAGGAAAAA 1886 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 124277 GT AGT CGC C ACTTT GAAAT AAAC CT C C CCAAAGGAAGACAT AT GTT AAAGGAAAAA 

124222 

Qy 1887 AAT ACAACT AAT GAT GC AC AT T TCT T AGAACACAAT AGT C CAT T GAT AT ACT ACT GCCT A 1946 

I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I 
Db 124221 T A- ACAACTAAC GGT C C AC AT TTGT T AGAT CACAAT - GT C C AT TGAT GT ACT ACT G CCT A 

124164 

Qy 1947 CTTT AC CT AGT T C AC CTTAACAT GT AAAT C C ACAGGGT AGAT T T CT TT CT AGAT GT GGAA 2006 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 124163 CTTT GC CT AGCT C ACCTTAACGTGTAAATCCACAGGGT AGAT TT CTT TCT AGAT GT GGAA 

124104 

Qy 2007 GT ACAAGAAAAT CTT T T TT AGTTAT TT G TTTGTTTACTTCGTCCCATGTGCTAACTA 2063 

III I II III II I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 124103 CCAGAAACGAGCTCTTAGTTGTCCTTTGTCTTTTATTTACTTGGTCCCATGTGCTGAGAA 

124044 

Qy 2064 T CTT - AT AT AT AAT GAGAGC C AGCT AC GT AAAAGT AGCT GAGAGGC CTT GGGAGT C ATT T 2122 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II II I 
Db 12 4 04 3 T CT T AAGAT ACAACAAGAACCAGCT AC GT GTGAGT AGCT C AC AGGCTT T GGGAAT CAT T G 

123984 

Qy 2123 ATCCCAAACTGGGTTTTTT CTCTCATCCTTCTAC 2156 

MINIMI I I I I I I I I I II I I I I I I I I I 

Db 123983 ATCCCAAACCAGGTTTTTTTGTTTTGTTTTGTTTTGTTTTGTTTACTCTCATTTTTCTGC 

123924 

Qy 2157 CTCCCTCCTTTGA — ATGAGGGTATGGT AGAAAAAGATCT GGCCCAATGGCATAAGTTT G 2214 

I I I I I I I I I I III II I I I I I I I I I I III I I I I I I I I I 
Db 123923 CT C CTC C C CTT GACCAAGAAT GGAC AGTT GAAGGAGAT ATAAC CC GGT GGCTT AT GTTAA 

123864 

Qy 2215 GAATTTTTAATTTTGGTTTTTCCT TTTGTTTATGGGGTT — GGGGGGAATGGCAGAT 2269 

I I I I I I I I I I Mill I I II II I I I I I M I I I I I I I I I I I I I I I I 
Db 123863 GAAATTATCCTTTTCCCTTTCCTTTTGTTTGTTTATGGGGTTGAGGGGAGAATGGCAAAT 
123804 

Qy 2270 TT AT AT GACTTT T CACT CAAAT CT AT AT GT GC C AGTTT AT AT T GACT C C GT AT GCAT GAG 232 9 

II I I I II I I I I I II I I I I I I I II I I I I I I II II I I I II M I I I I I I I I II I I 
Db 123803 T T GT AT GAT T T T T CACTAAAAT CT CT AT GT GCC AGGTT CT AT T GACT T T GT AT GCAT GAG 
123744 



Qy 



2330 T AT TT GT GCAAC ACAAG CA- C AACTAAGT AT GT AT AT ACAC AT GACGC AC ACGAT GCC AG 2388 
I I I I I III II I I I I Mill I II I III I I 



Db 12374 3 C GTT T CT GACACAAGC ACAGT AT AT GT CT GTAT AT AT GCACAAAGAAT GC AC AC GAC CT A 

123684 

Qy 2389 GGCCTAGACCTCCCAAGGGCTGTGCTCCTGCTCCCAGCAGCCCTCTCTTAGAATATTTCA 244 8 

III I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I 

Db 123683 GGGCCTGGACAGCAGAGGGCTAACATCTTACTATCAGCTGCCC-CTACAAGAGCACTTCA 

123625 

Qy 2449 GATGGATGAGCTTCTGACTCTTTCTTAAAATTCTTTTGGGAAGATTTCCCAGCCTTTCTT 2508 

II I I I I I I I I I I I I II II I I II I I I I I II I I I I I I I I I I I III 

Db 123624 GACAACCAAGCCTCTGCCTATTTATTAAAACCCTCCTGGGCAGATTTCCCAGCCTCCCTT 

123565 

Qy 2509 CACAACACTTT CT AACAT CAAAT GACT CT C AT CAT CAACAAAT TGT ATT C CTT AT 2563 

II II II III I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 123564 GGCAGGCAGCACTTTCTAAAGCTGAATAGGCCCCCATCATCAACAAATTCTCTTTCTTAT 

123505 

Qy 2564 TGTGAAATTAATACCCTCAGGCTCCATTTTACTGCTTTGCTCTTTGTCTGCATTAAGAGA 2623 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 123504 TTTGAAATAAATACCCACAGGCTCCTTTGATTTATTATGTTCTTTCCCTACATTAGGAGC 

123445 

Qy 2624 GGAT GAGGAGAGCT GGT CAAACATT CCT T GT GT TAAA AAAAT C AAAC ATT CAT AT C C 2680 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 123444 T GGT GAGAT GAG CT AGT CTAACCCT GTT T GT GT TT AACAGACAAGC AAAC AGT CAT AT C C 

123385 

Qy 2681 ACAAAATT TT CT GCTAAAT GACT C CACACT CAGCCTT CT CTACCCT GAACT GAAT TAT C A 2740 

I I I I I I I I I I I I I I I I I I I I I I I I I I I III III I I I I 

Db 123384 AC AAAC AGAG- T GTT GAAAGAT CT C GCACT CAGC CTT CT C C GTT CTAAT T AGAACAAT C A 

123326 

Qy 2741 CCCTTTTCTCCATGTTTTCAGAGTTCTTACTGCCCACAGTTTAATGGTGTGGCCTTTCCA 2800 

I I I I I I I II I II III II I I I I I I I I I I 

Db 123325 CCATTCTCTAGCTGACTCAGAGTTTTAAACTTGCCCACATTTTATTAACAAGGCCTTTGA 

123266 

Qy 2801 CAT AAT C C AC ATTAAGTT CT GT GT T CCT GT GT T GTT GT GGAACTAAGGACAAC AC ACAGT 2860 

I II I I I I I III III I I I I I I I I I I I I I I I I I I I I I 

Db 123265 T AT AAT CC AGGCAAAT T CT CT GC CT CCCT AT GG GT T GT GAAGCT AC GAACAACAC C CAAT 

123206 

Qy 2861 ACTT GAATAAGGGT CC GGC CTT TT GTT - T GTTT T AGAGAAAGTTGT ATT C CACAC ACAAC 2919 

I I I I I I I I I I I I I I I III III I I I I I I I I I I 

Db 123205 GAT T GAAAAT GCAT C C AGC CT T CC GTT C C CTT GT TTT AGAGGATT T GT GC C CCAACAT AT 

123146 

Qy 2920 CT AAT AATT T CT T AT AAAAAT TTT AAACT ACAAAGCT AC AT T T TT ACT T GCTT GT AGC C G 2979 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 123145 GTCTAAATTTCTCATACAAACTTTACACTACACCTGTTTACTGTTGCTTGCTTGTAGCCA 

123086 

Qy 2980 TTTTTGTTTGCCTTTGGGATTC-GGGCTTTGGCTGTGCCCATGCTAGGATTTAGCTGTGT 3038 

I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 123085 GGTTTGGATAACTTTGGAATCCTGGGGTTTGGCTGTGGCCCTACTACGGTTTAGTTGTAT 

123026 



Qy 3039 C ATT TT T AT GAT GT CTGTAACAAC CCAACAAGGTAACT GAAGCT C C AGAGTT AAGGTTT C 3098 

II I I I I I I I I I I I I I I I I I I I I M I II I I I I I I I I II 
Db 123025 AATTTCTACAGTGTCTGTAATGACCCAAGTGGGTGGCTGGAACATAAAAGTTACTAATTT 

122966 

Qy 3099 AGATTT CTAAAT GAAACT AT CTTTTTCAATT AC AT C CT GACTT GT ATAGAC AC AGC CAAA 3158 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I III 

Db 122965 GATTTTTTAAA CTTT T AAAAAAT ATT C CT GAC CT GT GT AGAT AC CAT C CAA 

122915 

Qy 3159 AAGAAACT GT TAAT AGCC AT C C GT C CAT GT AACT CT GT ATT T T ACTAAGGT AC CAAT AGC 3218 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 122914 AAGAAACT GTT AAC A- CT GT CT AT C CAT GTGATT CT GT CTT CT ACTAAT CTT C CAGT AGT 

122856 

Qy 3219 TCT T T C AT AGACTT GT GCT ACAAGAAGGT TAAAAGAC C AGTTT T - ATT T T CAGCATT CCT 3277 

I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 122855 TCTTTTGTTTAC-TGTGCTAAAAGAATGTCCAAAGACAACTTTTAATTTTCAGCATTCCT 

122797 

Qy 3278 CAT GCAT TT CAGT GGTAACCAAAAAATAAT T T GT CAAT T AAT AGTT GT GT GC CAAGC ACT 3337 

I I I I I I I I I I I I I I I I I I I II III I I I I I I I I I I I I I I 

Db 122796 CAT AC AT C CAAGT GGTAACT GAAAAGAT GAT T TAT CACT A GTGTGTGCCAAGAACT 

122741 

Qy 3338 CCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTATGTGT 3379 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 122740 CCTATTTTTTTGTTGTGTGTGTGTCTGTGTGTGTGTGTGTTT 122699 
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LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

REFERENCE 
AUTHORS 



AC114984 186417 bp DNA linear HTG 05-JUN-2003 

Mus musculus clone RP23-248F9, *** SEQUENCING IN PROGRESS 6 
unordered pieces . 
AC114984 

AC114984.6 GI: 30984634 

HTG; HTGS_PHASE1; HTGS_FULLTOP; HTGS_ACTIVEFIN . 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 

1 (bases 1 to 186417) 
Birren,B., Nusbaum, C. and Lander, E. 
Mus musculus, clone RP23-248F9 
Unpublished 

2 (bases 1 to 186417) 

Birren,B., Linton, L., Nusbaum, C, Lander, E., 
Anderson, S., Barna,N., Bastien,V., Bloom,T., 
Boukhgalter, B. , Brown, A., Camarata,J., Campopiano, A. , Chang, J. , 
Chazaro,B., Choepel,Y., Colangelo, M. , Collins, S., Collymore,A. , 
Cook, A., Cooke, P., DeArellano, K. , Dewar,K., Diaz, J. S., Dodge, S., 
Faro,S., Ferreira,P., FitzHugh,W., Gage,D., Galagan,J., Gardyna,S., 
Ginde,S., Gord,S., Goyette,M., Graham, L. , Grand-Pierre, N . , 
Hagos,B., Horton,L., Hulme,W., Iliev, I., Johnson, R. , Jones, C, 
Kama t, A'. , Karatas,A. , Kells,C, LaRocque,K., Lamazares , R. , 



Ali, A. , Allen, N. 
Boguslavkiy, L. , 



Landers, T., Lehoczky,J., Levine,R., Lindblad-Toh, K. , Liu, G. , 
MacLean,C, Macdonald, P . , Major, J., Marquis, N., Matthews, C, 
McCarthy, M. , McEwan,P., McKernan,K., Meldrim,J., Meneus,L., 
Mihova,T., Mlenga,V., Murphy, T., Naylor,J., Nguyen, C, Nicol,R., 
Norbu,C, Norman, C.H. , O'Connor, T., 0 1 Donnell, P . , O f Neil,D., 
Oliver, J., Peterson, K., Phunkhang, P. , Pierre, N., Pollara,V\, 
Raymond, C. , Retta,R., Rieback,M., Riley, R., Rise,C, Rogov,P., 
Roman, J., Rosetti,M., Roy, A. , Santos, R. , Schauer,S., Schupback, R. , 
Seaman, S., Severy,P., Spencer, B. , Stange-Thomann, N . , Stojanovic,N. , 
Strauss, N., Subramanian, A. , Talamas,J., Tesfaye,S., Theodore, J. , 
Topham, K., Travers,M., Travis, N., Trigilio,J., Vassiliev, H . , 
Viel,R., Vo,A., Wilson, B., Wu,X., Wyman,D., Ye,W.J., Young, G., 
Zainoun,J., Zembek,L., Zimmer,A. and Zody,M. 
TITLE Direct Submission 

JOURNAL Submitted ( 14-MAR-2002 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
REFERENCE 3 (bases 1 to 186417) 

AUTHORS Birren,B., Nusbaum,C, Lander, E., Abouelleil , A. , Allen,N., 
Anderson, S., Arachchi, H .M. , Barna,N., Bastien,V., Bloom,T., 
Boguslavkiy, L. , Boukhgalter , B. , Camarata,J., Chang, J., Choepel,Y., 
Collymore, A. , Cook, A., Cooke, P., Corum, B., DeArellano, K. , 
Diaz, J. S., Dodge, S., Dooley,K., Dorris,L., Erickson,J., Faro,S., 
Ferreira,P., FitzGerald,M. , Gage,D., Galagan,J., Gardyna,S., 
Graham, L., Grand-Pierre, N . , Hafez, N., Hagopian,D., Hagos,B., 
Hall, J., Horton,L., Hulme,W., Iliev, I., Johnson, R., Jones, C, 
Kamat,A., Karatas,A., Kells,C, Landers, T., Levine,R., 
Lindblad-Toh, K. , Liu, G. , Lui, A. , Mabbitt,R., MacLean,C, 
Macdonald, P . , Major, J., Manning, J., Matthews, C, McCarthy, M. , 
Meldrim, J., Meneus,L., Mihova,T., Mlenga,V., Murphy, T., Naylor,J., 
Nguyen, C, Nicol,R., Norbu,C, O'Connor, T., 0 1 Donnell , P . , 
0 ! Neil,D., Oliver, J. , Peterson, K., Phunkhang, P . , Pierre, N-, 
Rachupka,A., Ramasamy,U., Raymond, C, Retta,R., Rise,C, Rogov, P., 
Roman, J., Schauer,S., Schupback, R. , Seaman, S., Severy,P., Smith, C, 
Spencer, B. , Stange-Thomann, N . , Sto janovic, N . , Stubbs,M. , 
Talamas,J., Tesfaye,S., Theodore, J., Topham, K., Travers,M., 
Vassiliev, H . , Venkataraman, V. S . , Viel,R., Vo,A., Wilson, B., Wu,X., 
Wyman,D., Young, G., Zainoun,J., Zembek,L., Zimmer,A. and Zody,M. 

TITLE Direct Submission 

JOURNAL Submitted ( 05- JUN-2003 ) Whitehead Institute/MIT Center for Genome 
Research, 320 Charles Street, Cambridge, MA 02141, USA 
COMMENT On May 22, 2003 this sequence version replaced gi: 30023906. 

All repeats were identified using RepeatMasker : 
Smit, A.F.A. & Green, P. (1996-1997) 

http : //ftp . genome . Washington . edu/RM/ RepeatMasker . html 
Genome Center 

Center: Whitehead Institute/ MIT Center for Genome Research 

Center code: WIBR 

Web site: http://www-seq.wi.mit.edu 

Contact : sequence_submissions@genome . wi .mit . edu 

Project Information 

Center project name: L19035 
Center clone name: 248 F 9 



* NOTE: This is a 'working draft 1 sequence. It currently 

* consists of 6 contigs. The true order of the pieces 

* is not known and their order in this sequence record is 

* arbitrary. Gaps between the contigs are represented as 



* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 



* 


be preserved. 








1 


32842: 


contig 


of 32842 bp in length 




32843 


32942: 


gap of 


100 bp 


* 


32943 


61950: 


contig 


of 29008 bp in length 




61951 


62050: 


gap of 


100 bp 


* 


62051 


70963: 


contig 


of 8913 bp in length 


* 


70964 


71063: 


gap of 


100 bp 




71064 


74030: 


contig 


of 2967 bp in length 




74031 


74130: 


gap of 


100 bp 


* 


74131 


162809: 


contig 


of 88679 bp in length 




162810 


162909: 


gap of 


100 bp 




162910 


186417: 


contig 


of 23508 bp in length. 



FEATURES 

source 



BASE COUNT 
ORIGIN 



54373 



Location/Qualifiers 
1. .186417 

/organism="Mus musculus" 
/mo l_type=" genomic DNA" 
/db_xref="taxon: 10090" 
/clone="RP23-248F9" 

/clone_lib="RPCI-23 Female Mouse BAC" 
a 36475 c 36595 g 58359 t 615 others 



Query Match 24.6%; Score 853.2; DB 2; Length 186417; 

Best Local Similarity 70.7%; Pred. No. 7.5e-174; 

Matches 1473; Conservative 0; Mismatches 523; Indels 86; Gaps 22; 

Qy 1347 GT GAGC CT T CCAGAT GGT CACC CT C ACACT GCGAT T GCT GCT GCAAGAAT GGCAAAGGT G 1406 

I I I I I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I II I 
Db 172660 GTGAGCCTTCCCGGTGGTCCTCCTCTCATTGTGACTGCTGCTGCAAGAATGGCAAGGGAG 

172719 



Qy 1407 ACAAAGAAGGGGAGAGC G GCAC GT CTT GCAATGAC CTCT CCAC AT CT AGCT GC GACAGC C 1466 

III I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 

Db 172720 ACA AAGGAGAGAGC GGC AC CT C CT GCAATGAC CT GT CC ACT T CCAGCT GT GACAGC C 

172776 



Qy 14 67 AGT CTGAGGC C AGCT CT C C C C AGGAGAC GGT C AT CT GT GGT C CC GTGAC AC GC C AGAC C A 1526 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II II II I I I I I I I II 
Db 172777 AGT CAGAGGC CAGCT CT C C GC AGGAGAC GGT GAT CT GT GGGC CT GTAACGCGC CAGAGC A 

172836 



Qy 1527 AC AT CC AGACT CT GGAC C GT CCCAT CAAGAAGGG C C CT GT C CAGCT GAT C CAACAGTCAG 158 6 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I II II II I I I I II I I I I I I I I I I I I I 
Db 172837 AC AT CCAGACT CT GGAT C GGCCCAT CAAGAAAGGT C CGGT GC AGCT GAT C CAACAGTCAG 

172896 

Qy 1587 AGAT GCGGC GGAAAAGC GACTT ACT C C GGAT T CT GACT T CAGGCT CC AGGGAAT CGAAC A 1646 

I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1728 97 AGAT GAGGC GGAAAAGT GAC CT GCT C CGGACT CT GAC GT CAGGCT CCAGGGAGT CGAACA 

172956 



Qy 



1647 T GAGCAGCAAAAAAAAAGCT GT TAAAGAAAAGCT CT CAATT GAGGAGGAGCT GGAGAAAT 1706 
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I II I I I I I I I I I I I I I I I 



Db 172957 TAAGCAGCAAAAAGAAAGCT GC GAAGGAAAAGCT CT C CAT C GAGGAAGAGCT G GAGAAAT 

173016 

Qy 1707 GT AT C C AGGAT TT CCT AAAAAAAAAAATT C C AGAT CGGT T T C CT GAGAGAAAAC AT C CTT 1766 

I I I I I I I II I I I I I I II I II I I I I I I II I I I I I II I I I I I I I I I I I I I I I I II 
Db 173017 GT AT C CAGGATTT CTT GAAGATAAAAAT T C C AGAT C G CTT C C CT GAGC GAAAAC AT C CT T 

173076 

Qy 1767 GGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGGCGGGGAAAAAA 1826 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II I II 

Db 173077 GGCAGTCTGAACTTTTACGGAAGTATCATCTATAGGGGGAGGGCTGTGG 

173125 

Qy 1827 AAAAAAAAG AG T CAT T T T G AAAT T AAC C T C AT AAAAG G AAT T CAT AT T T T AAAG G AAAAA 1886 

I I II I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I II I I II 

Db 173126 GT AGT CGCCACT T T GAAAT AAAC CT C C C CAAAGGAAGAC AT AT GTT AAAG GAAAAA 

173181 

Qy 1887 AAT ACAACT AAT GAT GCACATTT CTT AGAAC ACAATAGT CC ATT GAT AT ACT ACT GCCT A 1946 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 173182 T A- ACAACT AAC GGT C C AC ATTTGTT AGAT C ACAAT - GT CC AT T GAT GT ACT ACT GC CT A 

173239 

Qy 1947 CTTTACCTAGTTCACCTTAACATGTAAATCCACAGGGTAGATTTCTTTCTAGATGTGGAA 2006 

I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I 
Db 17324 0 CTTTGCCTAGCTCACCTTAACGTGTAAATCCACAGGGTAGATTTCTTTCTAGATGTGGAA 

173299 

Qy 2007 GT ACAAGAAAAT CT T TT T T AGT TAT T T G TTTGTTTACTTCGTCCCATGTGCTAACTA 2063 

III I II III I I I I I I I I I I I I I I I I II I I I I I I I I I I 

Db 173300 CCAGAAACGAGCTCTTAGTTGTCCTTTGTCTTTTATTTACTTGGTCCCATGTGCTGAGAA 

173359 

Qy 2064 T CT T - AT AT AT AAT GAGAGC CAGCTAC GTAAAAGT AGCT GAGAGGCCT T GGGAGT C ATT T 2122. 

I I I I I III II III I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 173360 T CT TAAGAT ACAACAAGAAC CAGCTAC GT GT GAGT AGCT CAC AGGCTT T GGGAAT CATT G 

173419 

Qy 2123 ATCCCAAACTGGGTTTTTT CTCTCATCCTTCTAC 2156 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I 

Db 173420 ATCCCAAACCAGGTTTTTTTGTTTTGTTTTGTTTTGTTTTGTTTACTCTCATTTTTCTGC 

173479 

Qy 2157 CTCCCTCCTTTGA — ATGAGGGTATGGTAGAAAAAGAT CTGGCCCAATGGCATAAGTTT G 2214 

I I I I I I I I I I III II Mill I I I I I III I I I I I I I I I 
Db 173480 CT C CT C C CCTT GAC CAAGAAT GGACAGTT GAAGGAGAT AT AAC C CGGT GGCTT AT GTT AA 

173539 

Qy 2215 GAATTTTTAATTTTGGTTTTTCCT TTTGTTTATGGGGTT — GGGGGGAATGGCAGAT 2269 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I II 
Db 173540 GAAATTATCCTTTTCCCTTTCCTTTTGTTTGTTTATGGGGTTGAGGGGAGAATGGCAAAT 
173599 

Qy 2270 TT AT AT GACTTTTCACT CAAAT CT ATAT GT GCCAGTTTATATT GACTCCGTAT GCAT GAG 2329 

II I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I 
Db 173600 T T GT AT GAT TTT T C ACTAAAAT CT CT AT GT GCC AGGT T CT AT T GACTT T GT AT GCAT GAG 
173659 



Qy 2330 T ATT T GT GCAAC ACAAGC A- CAACTAAGT AT GT AT AT AC ACAT GAC GC ACAC GAT GCCAG 2388 

I I I I I III II I I I I I I I II I I I I III M 
Db 173660 C GTT T CT GACACAAGCAC AGT AT AT GT CT GTAT AT AT GC ACAAAGAAT GC ACAC GAC CT A 
173719 

Qy 2389 GGCCTAGACCTCCCAAGGGCTGTGCTCCTGCTCCCAGCAGCCCTCTCTTAGAATATTTCA 2448 

III I I I I I I I I I I I I I I I I I I I I I I I I III I I I II 

Db 173720 GGGCCTGGACAGCAGAGGGCTAACATCTTACTATCAGCTGCCC- CTACAAGAGCACTTCA 

173778 

Qy 24 4 9 GATGGATGAGCTTCTGACTCTTTCTTAAAATTCTTTTGGGAAGATTTCCCAGCCTTTCTT 2508 

II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I III 

Db 173779 GACAACCAAGCCTCTGCCTATTTATTAAAACCCTCCTGGGCAGATTTCCCAGCCTCCCTT 

173838 

Qy 2509 CACAACACTTTCT AACAT CAAAT GACT CT CAT CAT CAACAAATT GT ATT C CT TAT 2563 

II II II III I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 173839 GGCAGGCAGCACTTTCTAAAGCTGAATAGGCCCCCATCATCAACAAATTCTCTTTCTTAT 
173898 

Qy 2564 TGTGAAATTAATACCCTCAGGCTCCATTTTACTGCTTTGCTCTTTGTCTGCATTAAGAGA 2623 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I til 

Db 173899 TT T GAAATAAAT AC C C AC AGGCTC CTT T GATT TATT AT GT T CT T T C CCT AC ATT AGGAGC 

173958 

Qy 2624 GGAT GAGGAGAGCT GGT CAAACAT T CCTT GT GT TAAA AAAAT C AAAC AT T CAT AT C C 2680 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 173959 TGGT GAGATGAGCT AGT CTAACCCT GT TT GTGT TTAACAGACAAGCAAAC AGT CAT AT C C 

174018 

Qy 2681 ACAAAATT TT CT GCTAAAT GACT C C ACACT CAGCCT T CT CT ACC CT GAACT GAATTAT C A 2740 

I I II I I I I I I I I I I I I I I I I II I I I I I III III I I I I 
Db 174019 ACAAAC AGAG- TGT T GAAAGAT CT C GCACT CAGCCT T CT C C GTT CTAAT T AGAACAAT C A 
174077 

Qy 2741 CCCTTTTCTCCATGTTTTCAGAGTTCTTACTGCCCACAGTTTAATGGTGTGGCCTTTCCA 2800 

II I I I I I III II I I I I I I I I I I I I I I I 
Db 174078 CCATTCTCTAGCTGACTCAGAGTTTTAAACTTGCCCACATTTTATTAACAAGGCCTTTGA 
174137 

Qy 2801 C AT AAT CC ACATT AAGTT CT GT GT T CCT GT GT T GT T GT GGAACTAAGGACAACAC ACAGT 2860 

I I I I I I I I III III I I II I I I I I I I I I I I I I I I I I 

Db 174138 T AT AAT CCAGGCAAAT T CT CT GCCT CC CT AT GGGT T GT GAAGCT AC GAACAACAC C CAAT 

174197 

Qy 2861 ACTT GAATAAGGGT C C GGC CT TTT GT T - T GTT TT AGAGAAAGTT GT ATT C CACAC ACAAC 2919 

I I I I I I I I I I I I I I I III III I I I I I I I I I I 

Db 174198 GATT GAAAAT GCAT CCAGC CT T CCGT T C CCTT GTTT T AGAGGATTT GT GC CC CAACAT AT 

174257 

Qy 2920 CTAAT AATTT CTT ATAAAAAT TTTAAAC T ACAAAGCT AC AT TT T TACT T GCT TGT AGCCG 2979 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 174258 GTCTAAATTTCTCATACAAACTTTACACTACACCTGTTTACTGTTGCTTGCTTGTAGCCA 

174317 

Qy 2980 TTTTTGTTTGCCTTTGGGATTC-GGGCTTTGGCTGTGCCCATGCTAGGATTTAGCTGTGT 3038 



Db 174318 GGTTTGGATAACTTTGGAATCCTGGGGTTTGGCTGTGGCCCTACTACGGTTTAGTTGTAT 

174377 

Qy 3039 C ATTTTT ATGAT GT CT GT AACAAC CCAACAAGGT AAC T GAAGCT C CAGAGT TAAG GT T T C 3098 

I I I I II I I I I I I II I I I I I II III I I I I I I I I I I I II 
Db 174378 AATTTCTACAGTGT CT GTAATGACCCAAGT GGGTGGCT GGAACATAAAAGTTACTAATTT 

174437 

Qy 3099 AGAT TT CT AAAT GAAACT AT CT T TT T CAATT ACAT CCT GACTT GT AT AGAC ACAGCCAAA 3158 

111 lilt I I I I I II I I I I I I I I I I II I I I I III 

Db 174438 GATTTTTTAAA CTTTTAAAAAATATTCCTGACCTGTGTAGATACCATCCAA 

174488 

Qy 3159 AAGAAACT GTT AAT AGCCAT C C GT C CAT GTAACTCT GT ATTTT ACTAAGGT ACCAAT AGC 3218 

I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 174489 AAGAAACT GTTAAC A- CT GTCTAT CCATGT GATTCT GTCTT CT ACTAAT CTT CCAGTAGT 

174547 

Qy 3219 T CT TT C AT AGACTT GT GCT ACAAGAAGGT T AAAAGAC CAGT TTT - AT TTT C AGCATT CCT 3277 

I I I I I I II I I II I II I I I I I II I I I I II I I I I I I I I I II I I I I I I I I I 
Db 174548 TCTTTTGTTTAC-TGTGCTAAAAGAATGTCCAAAGACAACTTTTAATTTTCAGCATTCCT 

174606 

Qy 3278 CAT GC ATT T CAGT GGTAAC CAAAAAAT AATTT GT CAAT T AATAGTT GT GT GCCAAGC ACT 3337 

I I I I II I I I I I I I I I I I I I II III I I I I I II I I I I I I I 

Db 174607 CAT AC AT CCAAGT GGTAACT GAAAAGAT GATTT AT CACT A GTGTGTGCCAAGAACT 

174662 

Qy 3338 CCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTATGTGT 3379 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 174663 CCTATTTTTTTGTTGTGTGTGTGTCTGTGTGTGTGTGTGTTT 174704 



RESULT 10 

AC117867/c 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC117867 242679 bp DNA linear HTG ll-OCT-2002 

Rattus norvegicus clone CH230-376I9, *** SEQUENCING IN PROGRESS 

11 unordered pieces. 
AC117867 

AC117867.4 GI: 23618130 

HTG; HTGS_PHASE1; HTGS_DRAFT; HTGS_ENRICHED . 
Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 242679) 

Muzny, D.Marie . , Metzker , M. Lee . , Abramzon,S., Adams, C, Alder, J., 
Allen, C, Allen, H., Alsbrooks , S . , Amin,A., Anguiano,D., 
Anyalebechi, V. , Aoyagi,A., Ayodeji,M., Baca,E., Baden, H., 
Baldwin, D., Bandaranaike, D . , Barber, M. , Barnstead,M. , Benahmed, F. , 
Biswalo,K., Blair, J., Blankenburg, K. , Blyth,P., Brown, M., 
Bryant, N., Buhay,C, Burch,P., Burrell,K., Calderon,E., 
Cardenas, V., Carter, K., Cavazos,I., Ceasar,H., Center, A. , 
Chacko,J., Chavez, D . , Chen,G., Chen,R., Chen,Y., Chen,Z., Chu,J., 
Cleveland, C. , Cockrell,R., Cox,C, Coyle,M. , Cree,A., D'Souza,L., 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



Hernandez, R. , Hines, S . 
Hollins, B. , Howells, S . 
Jackson, L., Jacob, L., 
Karpathy, S . , Kelly, S . , 
Kowis,C. , Kraft, C.L. 



Davila,M.L., Davis, C, Davy-Carroll, L . , De Anda,C, Dederich,D., 
Delgado,0., Denson,S., Deramo,C, Ding,Y., Dinh,H., Divya,K., 
Draper, H., Dugan-Rocha, S . , Dunn, A. , Durbin,K., Duval, B., Eaves, K. , 
Egan,A., Escotto,M., Eugene, C, Evans, C. A., Falls, T., Fan,G., 
Fernandez, S . , Finley,M., Flagg,N., Forbes, L. , Foster, M. , Foster, P., 
Fraser,C.M., Gabisi,A., Ganta,R., Garcia, A., Garner, T., Garza, M. , 
Gebregeorgis , E. , Geer,K., Gill,R., Grady, M., Guerra,W. , Guevara, W., 
Gunaratne, P . , Haaland,W., Hamil,C, Hamilton, C, Hamilton, K. , 
Harvey, Y. , Havlak,P., Hawes,A., Henderson, N. , Hernandez, J. , 

Hladun,S.L. , Hodgson, A. , Hogues,M. , 
Hulyk,S., Hume, J., Idlebird,D., Jackson, A., 
Jiang, H., Johnson, B., Johnson, R. , Jolivet,A. , 
, Kelly, S., Khan, Z ., King, L. , Kovar,C, 
Lebow,H., Levan,J., Lewis, L., Li,Z., Liu, J., 
Liu, J., Liu,W., Liu,Y., London, P., Longacre,S., Lopez, J., 
Lorensuhewa, L. , Loulseged, H . , Lozado,R.J., Lu,X., Ma, J., 
Maheshwari,M. , Mahindartne,M. , Mahmoud,M. , Malloy,K., Mangum,A. , 
Mangum, B., Mapua,P., Martin, K., Martin, R. , Martinez, E. , 
Mawhiney,S., McLeod,M.P., McNeill, T . Z . , Meenen,E., 
Milosavl jevic, A. , Miner, G., Minja,E., Montemayor, J. , Moore, S., 
Morgan, M. , Morris, K. , Morris, S., Munidasa,M., Murphy, M. , Nair,L., 
Nankervis,C. , Neal,D., Newton, N., Nguyen, N., Norris,S., 
Nwaokelemeh, 0. , Okwuonu,G., Olarnpunsagoon, A. , Pal,S., Parks, K., 
Pasternak, S . , Paul,H., Perez, A. , Perez, L., Pf annkoch, C . , 
Plopper,F., Poindexter, A. , Popovic,D., Primus, E., Pu,L.-L., 
Puazo,M., Quiroz,J., Rachlin,E., Reeves, K., Regier,M.A., Reigh,R., 
Reilly,B., Reilly,M. , Ren,Y., Reuter,M. , Richards, S., Riggs,F., 
Rives, C, Rodkey,T., Rojas,A., Rose,M. , Rose,R., Ruiz, S. J., 
Sanders, W., Savery,G., Scherer,S., Scott, G., Shatsman,S., Shen,H., 
Shetty,J., Shvartsbeyn, A. , Sisson,I., Sitter, CD., Smajs,D., 
Sneed,A. , Sodergren, E . , Song,X.-Z., Sorelle,R., Sosa,J., 
Steimle,M., Strong, R. , Sutton, A. , Svatek,A., Tabor, P., Taylor, C, 
Taylor, T., Thomas, N., Thomas, S., Tingey,A., Trejos,Z., Usmani,K., 
Valas,R., Vera, V., Villasana, D. , Waldron,L., Walker, B., Wang, J., 
Wang,Q., Wang,S., Warren, J., Warren, R. , Wei,X., White, F. , 
Williams, G., Willson,R., Wleczyk,R., Wooden, H-, Worley,K., 
Wright, D., Wright, R. , Wu,J., Yakub,S., Yen, J., Yoon,L., Yoon,V., 
Yu,F., Zhang, J., Zhou, J., Zhou,X., Zhao,S., Dunn,D., von 
Niederhausern, A. , Weiss, R. , Smith, D.R., Holt, R. A., Smith, H.O., 
Weinstock,G. and Gibbs,R.A. 
Direct Submission 
Unpublished 

2 (bases 1 to 242679) 
Worley,K.C. 

Direct Submission 

Submitted ( ll-APR-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

3 (bases 1 to 242679) 

Rat Genome Sequencing Consortium. 
Direct Submission 

Submitted ( ll-OCT-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX. 77030, USA 

On Oct 9, 2002 this sequence version replaced gi: 21746224. 

The sequence in this assembly is a combination of BAC based reads 

and whole genome shotgun sequencing reads assembled using Atlas 



(http://www.hgsc.bcm.tmc.edu/projects/rat/). Each contig described 
in the feature table below represents a scaffold in the Atlas 
assembly (a 1 contig-scaf f old 1 ) . Within each contig-scaf f old, 
individual sequence contigs are ordered and oriented, and separated 
by sized gaps filled with Ns to the estimated size. The sequence 
may extend beyond the ends of the clone and there may be sequence 
contigs within a contig-scaf fold that consist entirely of whole 
genome shotgun sequence reads. Both end sequences and whole genome 
shotgun sequence only contigs will be indicated in the feature 
table. 

Genome Center 

Center: Baylor College of Medicine 
Center code: BCM 

Web si te : http : / /www. hgsc . bcm. tmc . edu/ 

Contact: hgsc-help@bcm.tmc.edu 
Project Information 

Center project name: GTZA 

Center clone name: CH230-376I9 
Summary Statistics 

Assembly program: Phrap; version 0.990329 

Consensus quality: 188097 bases at least Q40 

Consensus quality: 190770 bases at least Q30 

Consensus quality: 192614 bases at least Q20 

Estimated insert size: 191086; sum-of-contigs estimation 

Quality coverage: 6x in Q20 bases; sum-of-contigs estimation 



* NOTE: Estimated insert size may differ from sequence length 

* (see http://www.hgsc.bcm.tmc.edu/docs/Genbank__draft_data.html). 

* NOTE: This is a 'working draft 1 sequence. It currently 

* consists of 11 contigs. The true order of the pieces 

* is not known and their order in this sequence record is 

* arbitrary. Gaps between the contigs are represented as 

* runs of N, but the exact sizes of the gaps are unknown. 

* This record will be updated with the finished sequence 

* as soon as it is available and the accession number will 





be preserved. 










1 


185634 


contig 


of 185634 bp in length 




185635 


185734 


gap of 


unknown 


length 




185735 


189337 


contig 


of 3603 


bp in length 




189338 


189437 


gap of 


unknown 


length 




189438 


198672 


contig 


of 9235 


bp in length 


* 


198673 


198772 


gap of 


unknown 


length 




198773 


202047 


contig 


of 3275 


bp in length 




202048 


202147: 


gap of 


unknown 


length 




202148 


203477: 


contig 


of 1330 


bp in length 




203478 


203577: 


i gap of 


unknown 


length 




203578 


204617 


contig 


of 1040 


bp in length 




204618 


204717 


gap of 


unknown 


length 




204718 


205848 


contig 


of 1131 


bp in length 




205849 


205948 


gap of 


unknown 


length 




205949 


207199 


contig 


of 1251 


bp in length 




207200 


207299 


gap of 


unknown 


length 




207300 


208535 


contig 


of 1236 


bp in length 




208536 


208635 


gap of 


unknown 


length 


* 


208636 


211249 


contig 


of 2614 


bp in length 




211250 


211349 


gap of 


unknown 


length 




211350 


242679 


contig 


of 31330 bp in length. 



FEATURES 

source 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



misc feature 



BASE COUNT 59286 
ORIGIN 



Location/Qualifiers 
1. .242679 

/organism="Rattus norvegicus" 
/mol_type=" genomic DNA" 
/db_xref="taxon: 10116" 
/clone="CH230-376I9" 
1. .1668 

/note="wgs_end_ex tens ion 

clone_end: Sp6" 

1914. .2738 

/note=" clone_boundary 

clone_end: Sp6 

siteiMbol 

end_sequence : RXAWM53TV" 
8953. .9899 
/note="clone_boundary 
clone_end: T7 
site :MboI 

end_sequence : RXAWM53T J" 

185735. .187055 

/note="wgs_end_extension 

clone_end:T7" 

188311. .189337 

/ note="wgs_end__extension 

clone_end:T7" 

198773. .199845 

/ note= ;,, wgs_end__extension 

clone_end:T7" 

i 39026 c 39489 g 55897 t 48981 others 



Query Match 23.6%; Score 819.4; DB 2; Length 242679; 

Best Local Similarity 83.8%; Pred. No. 1.5e-166; 

Matches 995; Conservative 0; Mismatches 166; Indels 26; Gaps 



5; 



Qy 187 ATACGGACATCTGAGTAACTGGGGAATTGGCCTGCCTT GCAT GTGAGCTT GAT GGAAGAT 246 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 239544 AGACAGACCT CTGGGTAACTGGGCATTTGGCCTT CTT GCCT ACCGAAC CTAAT GGAAGAC 

239485 



Qy 247 T GGAT AT AGACGAGTT GATT AT ATTTTATGAAGTAGCAGCTCACTACCATCCACCAT 303 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2394 84 TGAATGTAGACAAGCT GATT ATT AC ATT CT GT GAAGTAAGAGCTCACTACCAGCCAGCTT 

239425 



Qy 304 C C AGGGT TTAAACT ACT T TT T C AGCAT CACTT CACCT GT GGACT CT TAT ACAT T TT GATT 363 

I II II I I I I I I I I I I I I I I I I I I I I I I I I I II I III! II 
Db 239424 TAAACTTCTCGTCCA-GTGTCCAGCATTTCTTCACCTGTGGACTCTCCTAAAATTTGCTT 

239366 



Qy 364 TCTTGGG GGAAAAAT ACT GGGATAAGAGGAGGT CATT TTT TAA 406 

I I I I I I I I I I I I II II I II I I I I II Ml II 

Db 239365 T ATT GGGAAAAT AAAAAACAAAATAAAACAAAAC CT GGGT AAGAGGAGGT AATTAACAAA 

239306 



Qy 



407 



— TAAGTTAGCATCCTTTTCCCTTTCTTACAAGTTGATCCAAAGGATAAGGCTGTGACTC 4 64 
II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 



Db 239305 AGAAAAGTTAGTACCTTTTCCTTACCTTACCAGTGGATGCAAAGGCCAGGGCTGTAACCC 

239246 



Qy 465 CATTGGATTGCAC CTTTAAATCAAAATAGCAGCAGCAGAAGAAAGGGACAAT GGCTCT GA 524 

I I I I I I I I I I I I I I I III I I II I I I I I I I II I I I I I I I I I I I II I I I I 

Db 239245 AGTTGGATTGCACCTTAAGTTCCA GGAAGCT GCAGAAGAAAGGGACAAT GGCT CT GA 

239189 

Qy 525 GT GGAAACT GT AGT C GT TAT T AT CCT C GAGAACAAGGGT C C GCAGT T CC CAACT CCTT C C 584 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I 
Db 239188 GTGGGAACTGTAGCCGTTATTATCCTCGGGACCAAGGGGCTGCTGTTCCCAACTCTTTCC 

239129 

Qy 585 CTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTATTTTACTCGCCATTCCACATTGA 644 

I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 239128 CT GAAGT C AT AGAGCTGAAT GT TGGGGGC C AGGT T TACTT TACT C GC CAT T C CAC ATT AA 

239069 

Qy 645 TAAGC AT C C CT CATT CC CT C CT GT GGAAAAT GT T TT C C CCAAAGAGAGAC AC GGCTAAT G 704 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 239068 TAAGT AT C C C C CATT CTCTCCTGT GGAAAAT GT T TT C C CCAAAGAGAGACACT GCTAAC G 

239009 

Qy 7 05 ATCT AGCCAAGGACT CCAAGGGAAGGTTTTT CATTGACAGAGATGGATT CTT GTTCCGTT 764 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I 

Db 239008 ATCTAGCCAAGGACT CCAAGGGAAGGTTTTT CAT CGACAGAGATGGCTTTCT GTTCCGTT 

238949 

Qy 7 65 ATATTCTGGACTATCTCAGGGACAGGCAGGTGGTCCTGCCTGATCACTTTCCAGAAAAAG 824 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 238948 AT ATT CTGGACT AT CTCAGGGACAGG CAGGT GGT CCT GCCT GAT CACTTT C CAGAAAGAG 

238889 

Qy 825 GAAGACT GAAAAGGGAAGCT GAAT ACT T C CAGCT C C C AGACT T GGT CAAACT C CTGACC C 884 

I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I III I I I I II I I I I I I I III 
Db 238888 GAAGGCT GAAAAGAGAAGCT GAGT AT TT C CAGCT C C CT GACCT C GT CAAACT C CT GGCC C 

238829 

Qy 885 CCGATGAAAT CAAGCAAAGCCCAGAT GAATTCT GCCACAGTGACTTT GAAGATGCCTCCC 944 

I II III I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 238828 CT GAGGAAGT CAAGCAAAGT CCGGAT GAGT T CT GCC AC AGTGACT T C GAAGAT GCCTC C C 

238769 

Qy 945 AAGGAAGCGACACAAGAATCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGG 1004 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I III I I I I I I I I I I I I 
Db 238768 AAGGAAGCGACACAAGAATCTGCCCCCCCTCTTCGCTGCTTCCTCATGACCGAAAGTGGG 

238709 

Qy 1005 GT T T CATT ACT GT GGGT T AC AGAGGAT CCT GCAC CT T GGGCAGAGAGGGAC AGGCAGAT G 1064 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MINIMI 
Db 238708 GTTTTATTACTGTGGGTTACAGGGGATCCTGTACCTTGGGCAGAGAGGGGCAAGCAGATG 

238649 

Qy 1065 CCAAGTTTCGGAGAGTTCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAG 1124 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 238648 CCAAGTTTCGGAGAGTTCCCCGGATTTTGGTTTGCGGAAGGATTTCCTTGGCAAAGGAAG 

238589 



Qy 1125 T CTTT GGAGAAACTTTGAATGAAAGCAGAGACCCT GATCGAGCCCCAGAAAGATACACCT 1184 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I II I I I I I 
Db 238588 TTTTTGGT G AAAC T T T G AAT G AAAGT AG AG AC C C C GAC C GAG C T C C AG AAAG AT AC AC CT 

238529 

Qy 1185 C CAGAT T TT AT CT CAAATT C AAGCAC CT GGAAAGGGCTT T T GAT AT GTT GT C AGAGT GT G 1244 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 238528 C CAGAT T TT AT CT CAAGTT CAAACAT CT GGAAAGAGCTTT T GAT AT GTT GT CAGAGT GT G 

238469 

Qy 1245 GATT CCACATGGTGGCCTGTAACT CAT CGGTGACAGCATCTTTCATCAACCAAT ATACAG 1304 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2384 68 GATTCCACATGGTGGCCTGTAACTCCTCGGTTACAGCATCTTTTGTCAACCAGT ATACAG 

238409 

Qy 1305 AT GACAAGAT CT GGT CAAGCT AC ACT GAAT AT GT CT T CT ACC GT GAG 1351 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 238408 AAGACAAGAT CT GGT C GAGCT AT ACT GAAT AC GT CTT CT AC C GTAAG 238362 



RESULT 11 
AC098707 

LOCUS . AC098707 230128 bp DNA linear ROD 21-JUN-2002 

DEFINITION Mus musculus clone RP23-1I13, complete sequence. 
ACCESSION AC098707 

VERSION AC098707.2 GI : 19909459 

KEYWORDS HTG . 

SOURCE Mus musculus (house mouse) 

ORGANISM Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
REFERENCE 1 (bases 1 to 230128) 

AUTHORS McPherson, J.D. and Waters ton, R. H . 

TITLE The sequence of Mus musculus clone 

JOURNAL Unpublished 
REFERENCE 2 (bases 1 to 230128) 

AUTHORS McPherson, J.D. and Waterston, R. H . 

TITLE Direct Submission 

JOURNAL Submitted ( 31-OCT-2001 ) Genome Sequencing Center, 4444 Forest Park 
Parkway, St. Louis, MO 63108, USA 
REFERENCE 3 (bases 1 to 230128) 

AUTHORS McPherson, J.D. and Waterston, R. H . 
TITLE Direct Submission 

JOURNAL Submitted ( 03-APR-2002 ) Genome Sequencing Center, 4444 Forest Park 
Parkway, St. Louis, MO 63108, USA 
REFERENCE 4 (bases 1 to 230128) 

AUTHORS McPherson, J. D. and Waterston, R. H . 
TITLE Direct Submission 

JOURNAL Submitted ( 21- JUN-2002 ) Genome Sequencing Center, 4444 Forest Park 
Parkway, St. Louis, MO 63108, USA 
COMMENT On Apr 3, 2002 this sequence version replaced gi : 16554404. 



Genome Center 

Center: Washington University Genome Sequencing Center 
Center code: WUGSC 

Web site : http : //genome . wus tl . edu/gsc/index . shtml 



Contact: submissions@watson . wustl . edu 

Project Information - 

Center project name: M BA0001I13 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Location/Qualifiers 
1. .230128 

/organism="Mus mus cuius" 
/ mo l_type=" genomic DNA" 
/db_xref="taxon: 10090" 
/clone="RP23-H13" 
69218 a 47164 c 45440 g 68306 t 



Query Match 23.4%; 
Best Local Similarity 83.1%; 
Matches 982; Conservative 



Score 811.8; DB 10; Length 230128; 
Pred. No. 6.8e-165; 
0; Mismatches 172; Indels 27; Gaps 



4; 



Qy 

Db 

176347 



187 AT AC GGAC AT CT GAGTAACT GGGGAATT GGCCT GC CTT GC AT GT GAGCTT GAT GGAAGAT 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

17 6288 AGACAGACCTCT GAGTAACT GGGCATTT GGCCTT CTT GCCT ACAGAGCCTAAT GGAAGAT 



246 



Qy 

Db 

176403 



247 T GGAT AT AGAC GAGT T GATT AT ATTTTATGAAGTAGCAGCTCACTACCATCCACCAT 

II II MM III I I I I I I II I I I I I I I I I II I I I II I I II I 
176348 T GAAT GT AGAGGAGCC GATT AT TAC AT C CT GT GAAGT AAGAGCTC GCT AT C AGC C A 



303 



Qy 

Db 

176459 



304 CCAGGGTTTAAACTACTTTTTCAGCATCACTTCACCTGTGGACTCTTATACATTTTGATT 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
176404 GCTTTTAACTTCTCCTCCAGCATTATTTCAACTGTGGACTCTCCTAAAATCTGCTT 



363 



Qy 

Db 

176519 



364 TCTTGGGGGAA AAATACTGGGATAAGAGGAGGTCATTTTTTAATAAG 

I I I I I I I I I I I I I I I I I I I I I II III 

176460 T GTT GGGAGAATAAAAACCAAAACAAAACCAGGGTAAGAAGGGGTGATTTAAAAAGAAAA 



410 



Qy 

Db 

176579 



411 T TAGCAT C CTT T T C C CTTT CTT ACAAGTT GAT CCAAAGGATAAGGCT GTGACT CC ATT GG 
I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

17 6520 GTTAGTATCTTTATCTTACCTCACTCGTGGATGCAAAGTCTAGGGCTGTAACTCAGTTGG 



470 



Qy 

Db 

176636 



471 ATT GC AC CTTTAAAT CAAAAT AGCAGC AGC AGAAGAAAGGGACAAT GGCT CT GAGT GGAA 
I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

17 6580 ATTGCACCTTTAATTCCA GGAGGCT GCAGAAGAAAGGGACAAT GGCT CTAAGTGGGA 



530 



Qy 

Db 

176696 



531 ACT GT AGT C GT TAT T AT CCT C GAGAACAAGGGT C C GC AGTT C C CAACT C CT TC C CT GAGG 
I I I I I I I I I I I I I I I I I I I I I II II III I II I I I I I I I I I I I I I I I I I I I I I 
176637 ACTGTAGCCGTTATTATCCTCGGGACCAGGGGGCTGCTGTTCCCAACTCCTTCCCTGAAG 



590 



Qy 

Db 

176756 



591 T GGT AGAGCTGAAT GT C GGGGGT CAAGT TT AT TTT ACT CGC CATT CC AC ATT GAT AAGCA 
I I I I I I I I I I I I I I II II II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
176697 T CAT AGAGCTGAAT GTT GGCGGCCAGGTTTACTTCACT CGCCATT CCACATTAAT AAGCA 



650 



Qy 651 T C C CT CAT T C C CT C CT GT GGAAAAT GTT TTC CCCAAAGAGAGAC AC GGCT AAT GAT CT AG 710 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 176757 T CCCCCATT CT CT CCTGT GGAAAAT GTT CTCCCCAAAGAGAGACACT GCTAACGAT CT AG 

176816 

Qy 711 C CAAGGACT C CAAGGGAAGGTT TTT CAT T GAC AGAGAT GGAT T CTT GTT CC GT TAT AT T C 770 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 176817 C CAAGGACT CCAAGGGAAGGT T TTT CAT T GACAGAGAT GGCT T T CT GTT C C GT TAT AT T C 

176876 

Qy 771 T GGACT AT CT CAGGGACAGGC AGGT GGT C CT GCCT GAT CACT TTC C AGAAAAAGGAAGAC 830 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I 
Db 176877 T GGACT AT CT CAGGGACAGGC AGGT GGT C CT GC CT GAT CACT TTC C AGAAAGAGGAAGGC 

176936 

Qy 831 TGAAAAGGGAAGCTGAATACTTCCAGCTCCCAGACTTGGTCAAACTCCTGACCCCCGATC 890 

I II I I I I I I I I I I I I I I I I I I I II I I I I I III I I I I I I I I I I I I I I I I I I I I I 
Db 176937 TGAAAAGAGAAGCTGAGTACTTCCAGCTCCCTGACCTCGTCAAACTCCTGGCCCCCGAGG 

176996 

Qy 891 AAAT CAAGCAAAGCCCAGAT GAATT CTGCCACAGT GACTTTGAAGATGCCT CCCAAGGAA 950 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 17 6997 ATGTCAAGCAAAGCCCGGATGAGTT CTGCCACAGT GACTTCGAAGATGCCT CCCAAGGAA 

177056 

Qy 951 GCGACACAAGAATCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCA 1010 

I I I I I I I I I II I I I I I I I I I I M M I I I I I III I I I I I I I I I I I I I I I I I I I 
Db 177057 GCGACACGAGAATCTGCCCCCCCTCTTCGCTGCTTCCTCACGACCGCAAGTGGGGTTTTA 

177116 

Qy 1011 TTACTGTGGGTTACAGAGGATCCTGCACCTTGGGCAGAGAGGGACAGGCAGATGCCAAGT 1070 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I 
Db 177117 TTACTGTGGGTTACAGGGGATCCTGTACCTTGGGCAGAGAGGGGCAAGCAGATGCCAAGT 

177176 

Qy 1071 TT CGGAGAGT T CC CC GGAT T TT GGT TT GT GGAAGGATT T C CT T GGCAAAAGAAGT CTTT G 1130 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177177 TCCGGAGAGTCCCCCGGATTTTGGTTTGCGGAAGAATTTCCTTGGCAAAAGAAGTCTTTG 
177236 

Qy 1131 GAGAAACTT T GAAT GAAAGCAGAGAC C CT GAT CGAGC C CC AGAAAGATACAC CT CCAGAT 1190 

II I I I I I I I I I I I I II I I I II I I I I I I II I I I I I I I I I I I M I I I I I I II I I I I I I 
Db 177237 GAGAAACTTTGAATGAAAGTAGAGACCCCGACCGAGCTCCAGAAAGATACACCTCCAGAT 
177296 

Qy 1191 T T T ATCT CAAAT T C AAGCAC CT GGAAAGGGCTTTT GAT AT GT T GT C AGAGT GT GGAT T CC 1250 

I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 177297 TTTATCT CAAGTTTAAACAT CT GGAAAGAGCTTTT GATATGTT GT CAGAGT GT GGATT CC 

177356 

Qy 1251 ACAT GGT GGCCT GTAACT CAT CGGT GACAGCAT CTTT CAT CAAC CAAT AT AC AGAT GAC A 1310 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I 
Db 177357 ACAT GGT GGC CTGTAACT C CT CGGT T AC AGCAT CTT TT GT CAAC C AGT AT AC AGAAGAT A 

177416 



Qy 1311 



AGAT CT GGT CAAGCT ACACT GAAT AT GT CTT CT AC CGT GAG 1351 
I I I I I I I I I I I I I I I I I I I I II I I I I I I I I III II 



Db 177417 AGATCT GGTC GAGCT AT ACT GAGT ACGT CTT CT AT C GT AAG 177457 



RESULT 12 

AC112599 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AC112599 249703 bp DNA linear HTG 21-SEP-2002 

Rattus norvegicus clone CH230-112A20, *** SEQUENCING IN PROGRESS 

AC112599 

AC112599.4 GI: 23266003 

HTG; HTGS_PHASE2; HTGS_DRAFT; HTGS_ENRICHED „ 
Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 249703) 

Muzny, D.Marie. , Metzker,M. Lee . , Abramzon,S., Adams, C, Alder, J., 
Allen, C, Allen, H., Alsbrooks , S . , Amin,A., Anguiano,D., 
Anyalebechi, V. , Aoyagi,A. , Ayodeji,M., Baca,E., Baden, H., 
Baldwin, D. , Bandaranaike, D . , Barber, M., Barnstead,M. , Benahmed,F., 
Biswalo,K., Blair, J., Blankenburg, K . , Blyth,P., Brown, M., 
Bryant, N., Buhay,C, Burch,P., Burrell,K., Calderon,E., 
Cardenas, V., Carter, K., Cavazos,I., Ceasar,H., Center, A., 
Chacko,J., Chavez, D. , Chen,G., Chen,R., Chen,Y., Chen,Z., Chu,J., 
Cleveland, C. , Cockrell,R., Cox,C, Coyle,M., Cree,A. , D f Souza,L., 
Davila,M.L., Davis, C, Davy-Carroll, L . , De Anda,C, Dederich,D., 
Delgado,0., Denson,S., Deramo,C, Ding,Y., Dinh,H., Divya,K., 
Draper, H., Dugan-Rocha, S . , Dunn, A. , Durbin,K., Duval, B., Eaves, K., 
Egan,A. , Escotto,M., Eugene, C, Evans, C. A., Falls, T., Fan,G., 
Fernandez, S. , Finley,M., Flagg,N., Forbes, L., Foster, M. , Foster, P., 
Fraser,C.M., Gabisi,A., Ganta,R., Garcia, A. , Garner, T., Garza, M. , 
Gebregeorgis, E. , Geer,K., Gill,R., Grady, M. , Guerra,W., Guevara, W., 
Gunaratne, P. , Haaland,W., Hamil,C, Hamilton, C, Hamilton, K., 
Harvey, Y., Havlak,P., Hawes,A. , Henderson, N . , Hernandez, J. , 
Hernandez, R. , Hines,S., Hladun,S.L., Hodgson, A., Hogues,M., 
Hollins,B., Howells,S., Hulyk,S., Hume, J. , Idlebird,D., Jackson, A., 
Jackson, L. , Jacob, L., Jiang, H., Johnson, B., Johnson, R. , Jolivet,A., 
Karpathy,S., Kelly, S., Kelly, S., Khan,Z., King,L., Kovar,C, 
Kowis,C, Kraft, C.L., Lebow,H., Levan,J., Lewis, L., Li,Z., Liu, J., 
Liu, J., Liu,W., Liu, Y. , London, P., Longacre,S., Lopez, J., 
Lorensuhewa, L. , Loulseged, H . , Lozado,R.J., Lu,X., Ma, J., 
Maheshwari,M. , Mahindartne, M. , Mahmoud,M., Malloy,K., Mangum,A. , 
Mangum, B., Mapua,P., Martin, K., Martin, R., Martinez, E., 
Mawhiney,S., McLeod,M.P., McNeill, T . Z . , Meenen,E., 
Milosavl jevic, A. , Miner, G., Minja,E., Montemayor , J. , Moore, S., 
Morgan, M. , Morris, K. , Morris, S., Munidasa,M., Murphy, M. , Nair,L., 
Nankervis, C. , Neal,D., Newton, N., Nguyen, N., Norris,S., 
Nwaokelemeh,0. , Okwuonu,G., Olarnpunsagoon, A. , Pal,S., Parks, K. , 
Pasternak, S. , Paul,H., Perez, A., Perez, L., Pf annkoch, C . , 
Plopper,F., Poindexter, A. , Popovic,D., Primus, E., Pu,L.-L., 
Puazo,M., Quiroz,J., Rachlin,E., Reeves, K., Regier,M.A. , Reign, R., 
Reilly,B., Reilly,M., Ren,Y., Reuter,M., Richards, S., Riggs,F., 
Rives, C, Rodkey,T., Rojas,A., Rose,M., Rose,R., Ruiz, S. J., 
Sanders, W., Savery,G., Scherer,S., Scott, G. , Shatsman,S., Shen,H., 
Shetty,J., Shvartsbeyn,A. , Sisson,I., Sitter, CD., Smajs,D., 
Sneed,A., Sodergren, E. , Song,X.-Z., Sorelle,R., Sosa,J., 



Steimle,M. , Strong, R., Sutton, A. , Svatek,A., Tabor, P., Taylor, C, 
Taylor, T., Thomas, N., Thomas, S., Tingey,A. , Trejos,Z., Usmani,K., 
Valas,R., Vera,V. , Villasana, D . , Waldron,L., Walker, B . , Wang, J., 
Wang,Q., Wang,S., Warren, J., Warren, R. , Wei,X., White, F. , 
Williams, G., Willson,R., Wleczyk,R., Wooden, H., Worley,K., 
Wright, D., Wright, R., Wu,J., Yakub,S., Yen, J., Yoon,L., Yoon,V., 
Yu,F., Zhang, J. , Zhou, J., Zhou,X., Zhao,S., Dunn,D., von 
Niederhausern, A. , Weiss, R. , Smith, D.R., Holt, R. A., Smith, H.O., 
Weinstock,G. and Gibbs,R.A. 
Direct Submission 
Unpublished 

2 (bases 1 to 249703) 
Worley, K.C. 

Direct Submission 

Submitted (22-FEB-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

3 (bases 1 to 249703) 

Rat Genome Sequencing Consortium. 
Direct Submission 

Submitted ( 21-SEP-2002 ) Human Genome Sequencing Center, Department 
of Molecular and Human Genetics, Baylor College of Medicine, One 
Baylor Plaza, Houston, TX 77030, USA 

On Sep 21, 2002 this sequence version replaced gi: 21743383. 
The sequence in this assembly is a combination of BAC based reads 
and whole genome shotgun sequening reads assembled using Atlas 
(http://www.hgsc.bcm.tmc.edu/projects/rat/). As a result, the 
sequence may extend beyond the ends of the clone and there may be 
contigs that consist entirely of whole genome shotgun sequence 
reads. Both end sequences and whole genome shotgun sequence only 
contigs will be indicated in the feature table. 
Genome Center 

Center: Baylor College of Medicine 

Center code : BCM 

Web site: http://www.hgsc.bcm.tmc.edu/ 

Contact: hgsc-help@bcm.tmc.edu 
Project Information 

Center project name: GRQH 

Center clone name: CH230-112A20 
Summary Statistics 

Assembly program: Ph'rap; version 0.990329 

Consensus quality: 233268 bases at least Q40 

Consensus quality: 235949 bases at least Q30 

Consensus quality: 237476 bases at least Q20 

Estimated insert size: 261159; sum-of-contigs estimation 

Quality coverage: 4x in Q20 bases; sum-of-contigs estimation 



* NOTE: Estimated insert size may differ from sequence length 

* (see http://www.hgsc.bcm.tmc.edu/docs/Genbank_draft_data.html) 

* NOTE: This sequence may represent more than one clone. 

* NOTE: This is a 'working draft 1 sequence. It currently 

* consists of 1 contigs. Gaps between the contigs 

* are represented as runs of N. The order of the pieces 

* is believed to be correct as given, however the sizes 

* of the gaps between them are based on estimates that have 

* provided by the submittor. 

* This sequence will be replaced 



FEATURES 

source 



misc feature 



misc feature 



misc feature 



misc feature 



BASE COUNT 
ORIGIN 



* by the finished sequence as soon as it is available and 

* the accession number will be preserved. 

* 1 249703: contig of 249703 bp in length. 
Location/Qualifiers 

1. .249703 

/organism="Rattus norvegicus" 
/mol_type="genomic DNA" 
/db_xref="taxon: 10116" 
/clone="CH230-112A20" 
1. .1198 

/note="wgs_end_extension 
clone_end: T7" 
4045. .5985 

/note="wgs_end_extension 
clone_end:T7" 
complement ( 6997 . . 7899 ) 
/note="clone_boundary 
clone_end: T7 
site : EcoRI 

end_sequence : BH365326" 
138060. .196295 
/note="clone__boundary 
clone_end: Sp6 
site : EcoRI 

end^sequence: BH365327" 
69486 a 47001 c 47902 g 74288 t 11026 others 



Query Match 23.2%; Score 806.2; DB 2; Length 249703; 

Best Local Similarity 69.2%; Pred. No. l.le-163; 

Matches 1420; Conservative 0; Mismatches 573; Indels 60; Gaps 21; 

Qy 1347 GT GAGC CT T C CAGAT GGT C ACC CT C ACACT GCGATT GCT GCT GCAAGAAT GGCAAAGGT G 1406 

I I I I I I I I I I I I I I I I I II I I II II I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 15914 9 GTGAGCCCTCCAGGTGGTCCTCCTCCCATTGTGATTGCTGCTGCAAGAATGGCAAGGGAG 

159208 

Qy 14 07 ACAAAGAAGGGGAGAGC GGCAC GT CTT GCAATGAC CT CT C CAC AT CT AGCT GC GACAGC C 1466 

III I I I I I I I I I I I I I I I II I I I I I I I I II I I I I II II I I I I I I I I I I I I I 

Db 159209 ACA AAGGGGAGAGTGGCACTTCCTGCAATGACCTCTCTACTTCCAGCTGCGACAGCC 

159265 

Qy 1467 AGT CTGAGGC C AGCT CT C CC CAGGAGAC GGT CAT CT GT GGTC C CGT GACAC GC C AGAC CA 1526 

I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I II II II II III II 
Db 15 92 66 AGT CAGAGGC CAGCT CT C C C CAGGAGAC AGT GAT CT GT GGGC CT GTAAC GC GT C AGGGC A 

159325 



Qy 1527 ACAT CCAGACT CT GGACCGTCCCAT CAAGAAGGGCCCT GTCCAGCTGAT CCAACAGTCAG 1586 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I II I I I I I I 
Db 159326 ACAT CCAGACT CT GGAC C GGCC CAT CAAGAAAGGC C C C GT GC AGCT GAT C CAAC AGTCAG 

159385 



Qy 1587 AGATGCGGCGGAAAAGCGACTTACTCCGGATTCTGACTTCAGGCTCCAGGGAATCGAACA 1646 

I I I I I II I I I I I I I I III J I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I 

Db 159386 AGATGAGGCGGAAAAGTGACCTGCTCCGGACTCTGACTTCCGGCTCTAGGGAGTCGAACA 

159445 



Qy 1647 TGAGCAGCAAAAAAAAAGCTGTTAAAGAAAAGCT CT CAATTGAGGAGGAGCTGGAGAAAT 1706 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 15944 6 TAAGCAGCAAAAAGAAAGCTGCGAAGGAAAAGGT CTCCAT CGAGGAAGAGCTGGAGAAAT 

159505 



Qy 

Db 

159565 



1707 GT AT CCAGGATT T CCTAAAAAAAAAAATT CC AGAT CGGTTT C CT GAGAGAAAAC AT CCTT 1766 
I I I II I I I I I I I I I II II I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I 
159506 GT AT CCAGGATTTCCTGAAGATAAAAATTCCAGATCGCTTCCCT GAGAGAAAACAT CCTT 



Qy 1767 GGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGGCGGGGAAAAAA 1826 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I III I I 

Db 159566 GGCAGTCTGAACTTTTACGGAAGTATCATCTATAGGGGAGGGCAGTGGGTAGTCA 

159620 



Qy 1827 AAAAAAAAGAGT C ATTT T GAAATT AAC CT C AT AAAAGGAATT C AT ATTTT AAAGGAAAAA 1886 

II I I II I I II I I I I II I I I I I I I I II I I I I I I I I I I II I I I 

Db 159621 C C ACTT T GAAATAAACCT CCT GAAAGGAAGACAT AT ATTAAAGGAAAAA 

159669 



Qy 1887 AAT ACAACTAAT GAT GC ACAT TTCTT AGAAC ACAAT AGTC CAT T GAT AT ACT ACT GCCTA 194 6 

I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 159670 T A- ACAACTAACAAT CCATAT GTGTT AGAAC ACAAT - GTCCATTGAT GT CCT ACTGCCTA 

159727 



Qy 1947 CT TT AC CT AGTT C AC CT TAAC AT GTAAAT CC ACAGGGT AGAT T T CTT T CT AGAT GT GGAA 2006 

II I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 159728 CTTT GC CT AGCT CACCTTAAC AT GTAAAT TCACAGGGT AGAT TTCTTTCT AGAT GT GGAA 

159787 



Qy 2007 GT AC AAGAAAAT CTT TTT T AGT TAT T T GT TTGTTTACTTCGTCCCATGTGCTAAC 2061 

I I I I I I III I II I I I I I I I I I I I I I I I I I I I I I 

Db 1597 88 CCAGAAGCGATGCCCTTATGCTGTCCTCTGTCTCTTATTTACTTGGTCCCATGTGTTGAG 

159847 



Qy 2062 TAT CTT- AT AT AT AAT GAGAGCCAGCT ACGTAAAAGT AGCTGAGAGGCCTT GGGAGTCAT 2120 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I 

Db 159848 AAT CT TAAGGT T CAAGGAGAAC C AGCT AC GT GAGT AGCT C GAAT C C CAAAC CT GCTTTTT 

159907 



Qy 2121 TTATCCCAAACTGGGTTTTTTCTCTCATCCTTCTACCTCCCTCCTTTGAATGAGGGTATG 2180 

I I I I I I I I I I I I I I I I I I I I I I III II 

Db 159908 TGTTTGTTTGTTTTGTTTCCTCTCATTTTCTGCCTCCTTCC-CTTGACCAAGAATGGACA 

159966 



Qy 2181 GTAGAAAAAGATCTGGCCCAATGGCATAAGTTTGGAATTTTTAATTTTGGTTTTTCCTTT 2240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 159967 GTTGAAGGAGATATAACCCAGTGGCATATGTTAAGAAATTATTCTTTTTCCTTTACTTTT 

160026 



Qy 2241 T GTTTATGGGGTTGGGGGGAAT GGCAGATTTAT AT GACTTTT CACTCAAAT CTATATGT G 2300 

II III I I I I III I I I I II I I I II I I I III I I I I I I I I I I I I I I I I 
Db 160027 GT TT AT GGGGT GAGGGGAGAAC GGC AGAT TT GT AT GATTT T C C ACT AAAAT CT CTATGT G 

160086 



Qy 2301 CCAGTTTATATT GACTCCGTAT GCATGAGT ATTT GT GCAACACAAGCA- CAACTAAGTAT 2359 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II I I II 



Db 160087 CCAGGTTCTATTGACTTTGTATGCATGAGCGTTTCTGACACAAGCACAGTATATGTCTGT 

160146 



Qy 2360 GTATATACACATGACGCACACGATGCCAGGGCCTAGACCTCCCAAGGGCTGTGCTCCTGC 2419 

I I I I I I I II III II I I I I I I I I I I I I I III 
Db 160147 AT AT AT GCACAAAGAAT GC AC AT GAC C C AGGGCT G GGACAGC AGAGGGCT AACAC CT T AC 
160206 

Qy 2420 T C C CAGC AGC CCT CT CT T AGAAT AT T T C AGAT GGAT GAGCTT CT GACTCT TT CT T AAAAT 247 9 

I I I I I I I I II I I III I I I I I I I I I I I I I I I II I I I I I 

Db 160207 TGCCAGCTGCCC-CTTCAAGAGCGCTTCAGACAACAAAGCCTCTGTCTATTCAGTAAAAC 

160265 

Qy 2480 T CTT T T GGGAAGAT T T C C C AGC CT T TCTT C ACAAC ACTT T CT - AACAT CAAAT GACT CT C 2538 

II I I I I I I I I I I I I I I I I I III I II I I I I I I I I I I I I I I I I 
Db 1602 66 CCTCCTGGGCAGATTTGCCAGCCTCCCTTGGCAACACTTTCTAAAGCTGTATAGGCCCCC 
160325 

Qy 2539 ATCATCAACAAATTGTATTCCTTATTGTGAAATTAATACCCTCAGGCTCCATTTTACTGC 259 8 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I II I I 

Db 160326 AT CAT CAAC AAAT TCCCTTTTTTTGAAACAAATACCCGCAGGCTCCTTTGATTTAC 

160381 

Qy 2599 T TT GCT CTTT GT CT G C ATT AAGAGAG GAT GAGGAGAGCT GGT CAAAC AT T CCTT GT GTT A 2658 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 160382 TGTGCTCTTTCCCTACATCAGGAGCTTGTGAGATGAGCTAGTCTAACCCTGTTTGTGTTT 
160441 

Qy 2659 AA AAAAT CAAACATT C AT AT C C ACAAAAT TT T CT GCT AAAT GACTC C ACACT CAGCC 2715 

II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 1604 42 AACAGACAAGCAAACAGT CACAT C C ACAAACAGAGCTT C - AAGACAC CACCT ACT CAGCC 
160500 

Qy 2716 TT CT CT ACC CT GAACT GAAT TAT CACC CTT TT CT C CAT GT T T TCAGAGT T CT T ACT GC C C 277 5 

I I I I I I III I I II I I I I I I I I I I I II I II I I I I I 

Db 160501 TT CTCCATT CTT ACT AGAAT GAT CACC ATT CT CTAGCTGACT CAGAGTTTTAACTTGCCC 

160560 

Qy 2776 AC AGT TTAAT GGT GT GG C CT T T CCAC AT AAT C CACAT TAAGT TCT GT GT T C CT GT GT T GT 2835 

I I I I I I I I I I I I II II I I I I I I I I I III I II I 

Db 160561 AC AT T TT AT T AAAGAGG C CTT T - GAT AT AAT C CAGGCAAATT CT T T GCAT ACCT GT GGT T 

160619 

Qy 2836 TGTGGAACTAAGGACAACACACA GT ACTT GAAT AAGGGTCCGGCCTTTTGTTTGT 2890 

I I I I I I I I I I I II II II I I I I I I 
Db 160620 TGT GAAGCAAT GAACT AAT TAAAC AT GC AT CCAGCCT T CT GTTCT CTT GT T T T AGAGGAT 
160679 

Qy 2891 TT T AGAGAAAGTT GT AT TC CAC AC ACAAC CTAATAATTT CT T ATAAAAAT T T T AAACT AC 2950 

II I I III III I I I I I I I I I I I I I I I I I I I I I I 
Db 160680 TTGTGTCCCCCCCCTCCCCCGCCACATACATCTTAATTTCTCATACAAACTTTCCACTAC 
160739 

Qy 2951 AAAGCTACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATTC-GGGCTTTG 3009 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 160740 ACCTATACACTGTTGTTTGCCTGTATCCAGGTTTGGATACCTTTGGAATCCTGGGGTTTG 

160799 



Qy 3010 GCTGTGCCCATGCTAGGATTTAGCTGTGTCATTTTTATGATGTCTGTAACAACCCAACAA 3069 

I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 160800 ACT GT GGCC CT ACT AT GGTT T AGT T GT AT C ATTT CT ACAGT GT CT GTAATAAT C C AAGT G 

160859 

Qy 3070 GGT AACT GAAG CT C C AGAGT T AAG GT TT CAGATTT CTAAAT GAAACT AT CT TT T T CAATT 3129 

I I I I I I I I I I III M I I I I I I I I I I I I 

Db 160860 GGT GACT GGAACAT AAAGGT TT CTAATTT GATTTT TT T A AACTTTTTTTTAA 

160911 

Qy 3130 AC AT C CT GACTT GT AT AGACAC AGC CAAAAAGAAACT GTTAAT AGC CAT C C GT C CAT GT A 3189 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 160912 TAGT C CT GACCT GT AT AGAT AC CAT C CAAAAGAAATT GT GAACA- CT GT CT AT C CAT GT G 

160970 

Qy 3190 ACT CT GTATTTT ACTAAGGT ACCAAT AGCT CTTTCATAGACTT GTGCTACAAGAAGGTT A 3249 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 160971 ACTCTGTCATCTATTAATCTACCAGTAGTTCTTCTGTTCACCTGTGTTAAAAGAATGTCC 

161030 

Qy 3250 AAAGAC CAGTT TT - AT TTT CAG CAT T C CT CAT GCATTT CAGT GGTAAC CAAAAAAT AAT T 3308 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I 

Db 161031 AAAGACAACTTTTAATTTTCAGCATT CCTCAT ATATCT CAGT GGTAACTGAAAAAGACGA 

161090 

Qy 3309 TGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATG 3368 

II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 161091 TTT ATCACTAGTGTGTGCCAAGCATTCCT-ATTTTTTGTTTTGTGTGTGTGTGTGTG 

161146 

Qy 3369 T GT GT AT GT GT AT 3381 

I I I I I I I I I I I 
Db 161147 T GT GT GT GT GT GT 161159 



RESULT 13 

BC049734 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REMARK 
COMMENT 



BC049734 781 bp mRNA linear ROD 01-APR-2003 

Mus musculus, clone IMAGE: 6771233, mRNA. 

BC049734 

BC049734. 1 GI: 29436685 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 781) 
Strausberg, R. 
Direct Submission 

Submitted ( 31-MAR-2003 ) National Institutes of Health, Mammalian 
Gene Collection (MGC) , Cancer Genomics Office, National Cancer 
Institute, 31 Center Drive, Room 11A03, Bethesda, MD 20892-2590, 
USA 

NIH-MGC Project URL: http://mgc.nci.nih.gov 
Contact: MGC help desk 
Email: cgapbs~r@mail.nih.gov 



Tissue Procurement: Dr. Jonathan Kuo, NIMH 

cDNA Library Preparation: Michael Browns tein Laboratory 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 

DNA Sequencing by: Genome Sequence Centre, 

BC Cancer Agency, Vancouver, BC, Canada 

inf o@bcgsc .be. ca 

Steven Jones, Jennifer Asano, Ian Bosdet, Yaron Butter field, 
Susanna Chan, Readman Chiu, Chris Fjell, Erin Garland, Ran Guin, 
Letticia Hsiao, Martin Krzywinski, Reta Kutsche, Oliver Lee, Soo 
Sen Lee, Victor Ling, Carrie Mathewson, Candice McLeavy, Steven 
Ness, Pawan Pandoh, Anna-Liisa Prabhu, Parvaneh Saeedi, Jacqueline 
Schein, Duane Smailus, Michael Smith, Lorraine Spence, Jeff Stott, 
Michael Thorne, Miranada Tsai, Natasja van den Bosch, Jill Vardy, 
George Yang, Scott Zuyderduyn, Marco Marra. 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Clone distribution: MGC clone distribution information can be found 
through the I.M.A.G.E. Consortium/ LLNL at: http://image.llnl.gov 
Series: IRAL Plate: 46 Row: g Column: 6 

This clone was selected for full length sequencing because it 
passed the following selection criteria: Hexamer frequency ORF 
analysis . 

Location/Qualifiers 
1. .781 

/organism="Mus musculus" 
/mol_type="mRNA" 
/db_xref="taxon: 10090" 
/clone="IMAGE: 6771233" 
/tissue_type="Testicle, mouse" 
/clone_lib="NIHjy[GC_169" 
/lab_host="DH10B" 
/note="Vector: pDNR-LIB" 
348 a 146 c 169 g 118 t 



Query Match 10.6%; 
Best Local Similarity 80.2%; 
Matches 463; Conservative 



Score 367.4; DB 10; 
Pred. No. 8.5e-69; 
0; Mismatches 96; 



Length 781; 
Indels 18; Gaps 



2; 



Qy 1347 GTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAATGGCAAAGGTG 1406 

I I I I I I I I I I I I I I I I I I I I I II II II I I I I I II I I II I I I I I I I I I II I 
Db 101 GTGAGCCTTCCCGGTGGTCCTCCTCTCATTGTGACTGCTGCTGCAAGAATGGCAAGGGAG 160 



Qy 

Db 

Qy 

Db 

Qy 

Db 



1407 ACAAAGAAGG GGAGAGC GGC AC GT CT T GCAAT GACCT CT C C ACAT C T AGCT GCGACAGCC 1466 

III II II I I I I I I I I I I I II I I I I I I I I I I I I I I II II I I I I I I I I I II I 
161 ACA AAGGAGAGAGCGGCAC CT C CT GCAAT GACCT GT C C ACT T C C AGCT GT GACAGCC 217 

1467 AGT CT GAGGC C AGCT CT C C C C AGGAGACGGT CAT CT GT GGT CC C GT GACACGC CAGAC CA 1526 
I I II I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I II II II I I I I I I I II 
218 AGT CAGAGGC CAG CT CT CC GCAGGAGACGGT GAT CT GT GGGCCT GT AACGC GC CAGAGC A 277 

1527 ACATCCAGACTCT GGACCGT CCCATCAAGAAGGGCCCT GT CCAGCTGATCCAACAGTCAG 158 6 
I I I I I I I I I I I I I I I I II I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I 
278 ACATCCAGACT CT GGATCGGCCCAT CAAGAAAGGTCCGGTGCAGCT GATCCAACAGTCAG 337 



Qy 

Db 



1587 AGATGCGGCGGAAAAGCGACTTACTCCGGATTCTGACTTCAGGCTCCAGGGAATCGAACA 164 6 
I I I I I I I I I I I I I I I III I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
338 AGAT GAGGCGGAAAAGT GAC CT GCT C C GGACT CT GAC GT CAGGCT C C AGGGAGT CGAAC A 397 



Qy 

Db 



1647 
398 



T GAGCAGCAAAAAAAAAGCT GTTAAAGAAAAGCTCT CAAT T GAGGAGGAGCT GGAGAAAT 
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II II I I I I I I I I I I I I I I I I 
TAAGC AGCAAAAAGAAAGCTG CGAAGGAAAAGCTCT CC AT C GAGGAAGAG CT G GAGAAAT 



1706 
457 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1707 GT AT C C AGGAT TT CCTAAAAAAAAAAAT T C C AGAT C GGT TT CCT GAGAGAAAAC ATC CTT 1766 
I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
458 GT AT C C AGGATTT CTT GAAGATAAAAAT T C CAGAT CGCTTCCCT GAGC GAAAAC AT C CTT 517 

1767 GGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGGCGGGGAAAAAA 1826 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III II I II 
518 GGCAGTCTGAACTTTTACGGAAGTATCATCTATAGGGGGAGGGCTGTGG 566 

1827 AAAAAAAAG AGT CAT T T T GAAAT T AAC C T C AT AAAAG G AAT T CAT AT T T T AAAG G AAAAA 1886 
I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
567 GT AGT C GCC ACT TT GAAAT AAAC CT C CC CAAAGGAAGACATAT GT TAAAGGAAAAA 622 

1887 AATACAACTAATGATGCACATTTCTTAGAACACAATA 1923 

I I I I I II I II I I I I I I I 

623 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 659 



RESULT 14 

BX323465 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



BX323465 175059 bp DNA linear HTG 06-JUN-2003 

Danio rerio clone DKEYP-9C6, *** SEQUENCING IN PROGRESS 3 
unordered pieces . 
BX323465 

BX323465.4 GI: 31559295 

HTG; HTGS_PHASE1; HTGS_DRAFT; HTGS_FULLTOP . 
Danio rerio (zebrafish) 
Danio rerio 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

Actinopterygii ; Neopterygii ; Teleos tei ; Os tariophysi ; 

Cyprinif ormes ; Cyprinidae; Danio. 

1 (bases 1 to 175059) 

Mclaren, S . 

Direct Submission 

Submitted (05- JUN-2003) Wellcome Trust Sanger Institute, Hinxton, 
Cambridgeshire, CB10 ISA, UK. E-mail enquiries: 

zfish-help@sanger.ac.uk Clone requests: clonerequest@sanger.ac.uk 
On Jun 9, 2003 this sequence version replaced gi : 29825526. 

Genome Center 

Center: Wellcome Trust Sanger Institute 
Center code: SC 

Web site: http://www.sanger.ac.uk 
Contact: zfish-help@sanger.ac.uk 

Project Information 

Center project name: zKp9C6 

Summary Statistics 

Assembly program: XGAP4; version 4.5 
Chemistry: Dye-terminator; 100% of reads 
Consensus quality: 174665 bases at least Q40 
Consensus quality: 174734 bases at least Q30 
Consensus quality: 174818 bases at least Q20 
Insert size: 174859; sum-of-contigs 
Insert size: 176642; 1.7% error; agarose-fp 



Quality coverage: 10.25x in Q20 bases; sum-of-contigs Quality 
coverage: 10.15x in Q20 bases; agarose- fp 



NOTE: This is a 'working draft 1 sequence. It currently 
consists of 3 contigs. The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 
be preserved. 

contig of 44665 bp in length 
gap of 100 bp 

contig of 127959 bp in length 
gap of 100 bp 

contig of 2235 bp in length. 



1 

44666 
44766 
172725 
172825 



FEATURES 

source 



misc feature 



misc feature 



imsc_ 

BASE COUNT 
ORIGIN 



44665: 
44765: 
172724: 
172824: 
175059: 
Location/Qualif iers 
1. .175059 

/organism="Danio rerio" 
/mol_type="genomic DNA" 
/db_xref="taxon:7955" 
/clone="DKEYP-9C6" 
/clone_lib= ff DanioKeypilot" 
1. .44665 

/note="assembly_f ragment : 01355 
clone_end:T7 
vector_side: lef t" 
44766. .172724 

/note="assembly_f ragment: 00935. 0" 
172825. .175059 
/note= ,f assembly_f ragment : 01078" 
56742 a 30793 c 31197 g 56126 t 201 others 



feature 



Query Match 9.2%; Score 319.2; DB 2; Length 175059; 

Best Local Similarity 63.6%; Pred. No. 2.3e-58; 

Matches 533; Conservative 0; Mismatches 278; Indels 27; Gaps 2; 

Qy 514 AATGGCTCTGAGTGGAAACTGTAGTCGTTATTATCCTCGAGAACAAGGGTCCGCAGTTCC 573 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I III 
Db 127701 AAT GGC C CT GACT GAAAAT T GC AGGACTT AT CAAAC GCC CAAGGAC AGT GGAT GT GCTC A 

127760 

Qy 574 CAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTATTTTACTCGCCA 633 

I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 127761 GAGT T GCT CT T CT GAT GT G GT GGAGCT CAAT GT AGGT GGAC AGGT GT ACT ACACT C GCC A 

127820 



Qy 634 T T CCAC ATT GAT AAGC ATC C CT CAT TCCCTCCTGT GGAAAAT GTTT T CC C CAAAGAGAGA 693 

I I I I I II II III I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 127821 TGCCACCCTCACCAGTGTGCCAAACTCACTGCTGGGTAAATTGTTCTCCTCTAAAAAAGA 

127880 



Qy 694 C ACGGCT AAT GAT CT AGCCAAGGACT C CAAGGGAAGGTT TT T C ATT GAC AGAGAT GGAT T 753 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 127881 CATT T CT AAC GACCT C ACG C AGGAC AT CAAGGGAC GCT ACT T CAT CGAC C GGGAC GGAT T 

127940 



Qy 

Db 

128000 



754 CTTGTTCCGTTATATTCTGGACTATCTCAGGGACAGGCAGGTGGTCCTGCCTGATCACTT 
III III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
127941 TCTCTTTAGGTACGTGCTGGACTATCTCCGCGATAAGACTGTCGTCCTGCCGGATTATTT 



813 



Qy 

Db 

128060 



814 T C C AGAAAAAGGAAGACT GAAAAGG GAAGCT GAAT ACT T CC AGCT CC CAGACT T GGT CAA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II 
128001 TCCGGAGAAGGGGAGGCTGAAACGCGAAGCTGAGTTTTTCCAGCTGCCCGAGCTCGTCAA 



873 



Qy 

Db 

128099 



874 AC T C CT GACC C CC GAT GAAAT C AAGCAAAGC CCAGAT GAAT T CT GC C ACAGT GACT TT GA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

12 8061 AATCCT AAAC CCAGAT GAT TATAGT C ACAGT GAT TTTGA 



933 



Qy 

Db 

128159 



934 AGATGCCTCCCAAGGAAGCGACACAAGAATCTGCCCCCCTTCCTCCCTGCTCCCTGCCGA 
I I II I I I I I I I I I I I I I I II II II I I I I I I I I II III 
128100 C GAAG CAT C C CAGGGAAGCGAC C AGAGGT T AT AT CCAGC CT CTT ACCT GGACGC GC GCGA 



993 



Qy 

Db 

128218 



994 CCGCAAGTGGGGTTTCATTACTGTGGGTTACAGAGGATCCTGCACCTTGGGCAGAGAGGG 
III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

128160 C C GAC GCT AC GGCTT CAT CAC GGT C GGGT ACAAGAGCT C GT GC GC AT T CGGGAGGGAC A- 



1053 



Qy 

Db 

128273 



1054 AC AGGCAGAT GC CAAGTTT CGGAGAGT T CC C C GGAT T T T GGTTT GTG GAAGGATTT C CT T 1113 
I I I I I I I I II I I I I II I I II I I I I I I I I I I I 

128219 CTGATCCCAAAGCCCGCGGAATACCCAAAATCTTCATTTGCGGAAGAGTCGGTCT 



Qy 

Db 

128333 



1114 GGCAAAAGAAGT CTT T G GAGAAACT T T GAAT GAAAGCAGAGACC CTGAT C GAGCCC CAGA 1173 
I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
12827 4 GGCGAAAGAAGTT TT C GGCGAC GCACTAAAC GAGAGCAGGGAT C CT GACAGAC C GC C GGA 



Qy 

Db 

128393 



117 4 AAGATACACCTCCAGATTTTAT CTCAAATT CAAGCACCT GGAAAGGGCTTTTGATAT GTT 
I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 

128334 GCGTTACACTTCTCAGTTTTACCTGAAGTTTCGCCACCTGGAGCGAGCGTTTGATATGCT 



1233 



Qy 

Db 

128453 



1234 GT C AGAGT GT GGATT C CAC AT GGT GGCCT GT AACT CAT C GGT GAC AGCAT CTT T CAT CAA 1293 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I III 
128394 C G C GGAGAGC GGGTT C CAC ATC GT C GCGT GCAAT T CAT C ACT CAC CAC AT CTC CTC ACAA 



Qy 1294 CCAAT ATACAGATGACAAGAT CTGGTCAAGCTACACT GAAT ATGT CTT CT ACCGTGAG 1351 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 128454 CAGGCATGCT GAT GATAGATACTGGTCCAACAACACAGAGT ACGT CTTCT ATCGTAAG 128511 



RESULT 15 
BX470157 

LOCUS BX470157 200467 bp DNA linear HTG 05-MAY-2003 

DEFINITION Danio rerio clone CH211-119P14 , *** SEQUENCING IN PROGRESS 8 

unordered pieces. 
ACCESSION BX470157 



VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



COMMENT 



BX470157.2 GI: 30387082 
HTG; HTGS_PHASE1 . 
Danio rerio (zebrafish) 
Danio rerio 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Actinopterygii; Neopterygii; Teleostei; Ostariophysi ; 
Cyprini formes; Cyprinidae; Danio. 
1 (bases 1 to 200467) 
McLay, K. 

Direct Submission 

Submitted { 04-MAY-2003 ) Wellcome Trust Sanger Institute, Hinxton, 
Cambridgeshire, CB10 ISA, UK. E-mail enquiries: 

zfish-help@sanger.ac.uk Clone requests: clonerequest@sanger.ac.uk 
On May 5, 2003 this sequence version replaced gi: 30349786. 

Genome Center 

Center: Wellcome Trust Sanger Institute 
Center code: SC 

Web site: http://www.sanger.ac.uk 
Contact : zf ish-help@sanger .ac.uk 

Project Information 

Center project name: ZC119P14 

Summary Statistics 

Assembly program: XGAP4; version 4.5 
Chemistry: Dye- terminator ; 100% of reads 
Consensus quality: 198546 bases at least Q40 
Consensus quality: 199010 bases at least Q30 
Consensus quality: 199314 bases at least Q20 
Insert size: 199767; sum-of-contigs 
Insert size: 201190; 3.3% error; agarose-fp 

Quality coverage: 5.95x in Q20 bases; sum-of-contigs Quality 
coverage: 6.07x in Q20 bases; agarose-fp 



NOTE: This is a 'working draft 1 sequence. It currently 
consists of 8 contigs . The true order of the pieces 
is not known and their order in this sequence record is 
arbitrary. Gaps between the contigs are represented as 
runs of N, but the exact sizes of the gaps are unknown. 
This record will be updated with the finished sequence 
as soon as it is available and the accession number will 



FEATURES 

source 



* 


be preserved. 








1 


9514: 


contig 


of 9514 bp in length 


* 


9515 


9614: 


gap of 


100 bp 


* 


9615 


14582: 


contig 


of 4968 bp in length 


* 


14583 


14682: 


gap of 


100 bp 




14683 


18933: 


contig 


of 4251 bp in length 


* 


18934 


19033: 


gap of 


100 bp 


* 


19034 


66645: 


contig 


of 47612 bp in length 




66646 


66745: 


gap of 


100 bp 


* 


66746 


73558: 


contig 


of 6813 bp in length 


* 


73559 


73658: 


gap of 


100 bp 




73659 


113761: 


contig 


of 40103 bp in length 


* 


113762 


113861: 


gap of 


100 bp 


* 


113862 


193754: 


contig 


of 79893 bp in length 




193755 


193854: 


gap of 


100 bp 




193855 


200467: 


contig 


of 6613 bp in length. 




Location/Qualifiers 




1. 


.200467 







/organism="Danio rerio" 

/mol_type=" genomic DNA" 

/db_xref="taxon:7955" 

/clone="CH211-119P14" 

/clone_lib="CHORI-2 11" 
misc_feature 1. .9514 

/note="assembly_f ragment : 01971 

f ragmen t_chain: 1 

clone_end: SP6 

vector_side : left" 
misc_feature 9615. .14582 

/note="assembly__f ragment : 01296 

f ragment_chain: 1" 
misc_feature 14683. .18933 

/note="assembly_f ragment : 0094 6 

f ragment_chain: 1" 
misc_feature 19034. .66645 

/note="assembly_f ragment : 00199 

f ragment_chain: 1" 
misc_feature 66746. .73558 

/note="assembly_f ragment : 02277 

f ragment_chain : 2 " 
misc_feature 73659. .113761 

/note="assembly_f ragment : 01029 
. f ragment_chain: 2" 
misc_feature 113862. .193754 

/note="assembly_f ragment : 00416 

f ragment__chain : 2 " 
mis cofeature 193855. .200467 

/note= " as sembly_f ragment : 02202 

f ragment_chain : 2 

clone_end: T7 

vector_side : right" 
BASE COUNT 64222 a 35134 c 35671 g 64740 t 700 others 
ORIGIN 



Query Match 9.2%; Score 317.6; DB 2; Length 200467; 

Best Local Similarity 63.5%; Pred. No. 5.2e-58; 

Matches 532; Conservative 0; Mismatches 279; Indels 27; Gaps 2; 

Qy 514 AAT GGCT CTGAGT GGAAACT GT AGT C GT T ATT ATC CT C GAGAACAAGGGT CCGC AGT T CC 573 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I III 
Db 147426 AAT GGCC CT GACT GAAAATTGCAGGACT T AT CAAAC GC C CAAGGAC AGT GGAT GT GCT C A 

147485 



Qy 574 CAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTATTTTACTCGCCA 633 

I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 147486 GAGTTGCT CTT CTGAT GTGGTGGAGCT CAATGTAGGTGGACAGGTGT ACTACACT CGCCA 

147545 



Qy 634 TT C CAC AT TGATAAGC AT C C CT C ATT C CCT CCT GT GGAAAAT GT TT T CC C CAAAGAGAGA 693 

I III II II III I II II III I III INI Ml I II I III 
Db 147546 TGTCACCCTCACCAGTGTGCCAAACTCACTGCTGGGTAAATTGTTCTCCTCTAAAAAAGA 

147605 



Qy 



694 CACGGCTAAT GAT CT AGCCAAGGACTCCAAGGGAAGGTTTTTCATT GACAGAGAT GGATT 753 
II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 



Db 147606 CATTT CTAACGACCTCACGCAGGACAT CAAGGGACGCTACTT CAT CGACCGGGACGGATT 

147665 



Qy 

Db 

147725 



754 CTTGTTCCGTTATATTCTGGACTATCTCAGGGACAGGCAGGTGGTCCTGCCTGATCACTT 
III III I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
147666 TCTCTTTAGGTACGTGCTGGACTATCTCCGCGATAAGACTGTCGTCCTGCCGGATTATTT 



813 



Qy 
Db 

147785 



814 T CC AGAAAAAGGAAGACTGAAAAGGGAAGCT GAAT ACT T C C AGCT C C C AGACT T GGT CAA 
I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
147726 TCCGGAGAAGGGGAGGCTGAAACGCGAAGCTGAGTTTTTCCAGCTGCCCGAGCTCGTCAA 



873 



Qy 

Db 

147824 



874 ACT C CT GAC CC C C GAT GAAAT CAAGCAAAGCC C AGAT GAATT CT GC C ACAGT GACTTT GA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

14778 6 AATCCT AACCCCAGATGATTATAGTCACAGT GATTTT GA 



933 



Qy 

Db 

147884 



934 AGAT GC CT C C CAAGGAAGCGACACAAGAAT CT GC C CC C CTT C CTC C CT GCT CC CTGCC GA 
I I I I I I I I I I I I I I I I I I II II II I I I I I I I I II III 
147825 CGAAGCATCCCAGGGAAGCGACCAGAGGTTATATCCAGCCTCTTACCTGGACGCGCGCGA 



993 



Qy 

Db 

147943 



994 C CGCAAGT G GGGT T T CATT ACT GT GGGTTAC AGAGGAT C CT GCAC CT T GGGC AGAGAGG G 1053 
III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

147885 CCGACGCTACGGCTTCATCACGGTCGGGTACAAGAGCTCGTGCGCATTCGGGAGGGACA- 



Qy 

Db 

147998 



1054 ACAGGCAGATGCCAAGTTTCGGAGAGTTCCCCGGATTTTGGTTTGTGGAAGGATTTCCTT 
I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 

147944 CT GATC C CAAAGCCC GC GGAAT AC C CAAAAT CT T C AT TTGCGGAAGAGT C GGT CT 



1113 



Qy 

Db 

148058 



1114 GGCAAAAGAAGT CT TT GGAGAAACT T T GAAT GAAAGC AGAGAC CCT GAT C GAGC C C CAGA 1173 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
147999 GGC GAAAGAAGT T T T CGGCGAC GCACTAAACGAGAGC C GGGAT CCT GACAGAC C GC CGGA 



Qy 

Db 

148118 



1174 AAGAT ACAC CT CCAGAT TTT AT CT CAAAT T CAAGC AC CT GGAAAG GGCTT T T GAT AT GT T 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

14 8059 GCGTTACACTTCTCAGTTTTACCTGAAGTTTCGCCACCTGGAGCGAGCGTTTGATATGCT 



1233 



Qy 

Db 

148178 



1234 GT C AGAGT GT GGAT TCC ACAT GGT GGC CT GTAACT CAT CGGT GAC AGCAT CTTT CAT CAA 
I I I I I I I I I I I I I II I I I I I I I I I I I I I III I I I II I III 
148119 C GC GGAGAGCGGGT T C CACAT C GT C G C GT GCAAT T CAT C ACT C AC C ACAT CT C C C C ACAA 



1293 



Qy 1294 C CAAT AT AC AGAT GACAAGAT CT GGT CAAGCT AC ACT GAAT AT GT CT T CT AC C GT GAG 1351 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 148179 C AGGC AT GCT GAT GAT AGAT ACT GGT C CAACAACAC AGAGT AC GT CT T CT AT C GTAAG 148236 



Search completed: January 29, 2004, 02:30:04 
Job time : 12473 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



Searched: 



January 28, 2004, 20:04:04 ; Search time 867 Seconds 

(without alignments) 
10797.752 Million cell updates/ 

US-10-056-884A-1 
3468 

1 caagcactgtgctaaagtgt aaaaaaaaaaaaaaaaaaaa 3468 

IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



2552756 seqs, 1349719017 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 



5105512 



Post-processing : 



Database 



Minimum Match 0% 
Maximum Match 100% 
Listing first 45 summaries 

N_Geneseq_19 Jun03 : 



1 

2 
3 
4 
5 
6 
7 
8 
9 

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 



/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1980 . DAT: * 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1981 . DAT: * 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1982 . DAT: * 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1983 . DAT : * 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1984 . DAT: * 
/ SIDS1/ gcgdata/geneseq/ geneseqn-embl/NA1985 . DAT : * 
/SIDS1/ gcgdata/ geneseq/ geneseqn-embl/NA1986 . DAT : * 
/ SIDS1/ gcgdata/ geneseq/ geneseqn-embl/NA1987 . DAT : * 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1988 . DAT: * 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1989 . DAT: 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1990 . DAT: 
/ SIDSl/gcgdata/geneseq/geneseqn-embl/NA1991 . DAT : 
/ SIDSl/gcgdata/geneseq/geneseqn-embl/NA1992 . DAT : 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1993 . DAT : 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1994 . DAT : 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1995 . DAT : 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1996 . DAT : 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1997 . DAT : 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1998 . DAT : 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA1999 . DAT : 
/ SIDSl/gcgdata/geneseq/geneseqn-embl/NA2000 . DAT : 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA2 001A.DAT 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA2001B.DAT 
/SIDSl/ gcgdata/ geneseq/ genes eqn-embl/NA2 002 . DAT : 
/SIDSl/gcgdata/geneseq/geneseqn-embl/NA2003 . DAT : 



Pred. No. is the number of results predicted by chance to have a 



score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 

% 

Result Query 





No. 


Score 


Match 


Length 


DB 


ID 


Description 




1 


3468 


1 DO 

-L \J \J . 


n 

u 


3468 


24 


AAD46068 


Human K+betaM2 cDN 




2 


1640.8 


47 
*± / . 




2412 


24 


ABN59764 


Novel human coding 




3 


769 




9 
Z 


769 


24 


AAD46125 


Human BAC AC008652 


c 


4 


699.2 


9 n 

Z U . 


9 
Z 


906 


22 


ABA09216 


Human VM106R.1 horn 




5 


423.4 


1 9 

1Z . 


9 
Z 


440 


22 


AAS34230 


Human cDNA encodin 




6 


319.4 


Q 


9 
Z 


2398 


25 


AAD49513 


Human TRICH-15 cDN 




7 


205 


o • 


q 


632 


24 


ABV99059 


Human pancreatic c 




8 


201 


c 

Z> . 


Q 

o 


614 


24 


ABV95156 


Human pancreatic c 




9 


167 


A 

H . 


Q 
O 


2052 


24 


ABT09812 


Polynucleotide enc 


c 


10 


167 


A 

4 . 


Q 
O 


109201 


24 


ABQ88125 


Human osteoblast d 




11 


114.2 


o 


Q 


854 


24 


ABQ40654 


Oligonucleotide fo 


c 


12 


114.2 


o 


6 


854 


24 


ABQ40655 


Oligonucleotide fo 




13 


109.8 


O . 


Z 


1757 


24 


ABQ13668 


Oligonucleotide fo 


c 


14 


109.8 


O 

o . 


9 
Z 


1757 


24 


ABQ13669 


Oligonucleotide fo 


c 


15 


108.6 


o 


J. 


854 


24 


ABQ40656 


Oligonucleotide fo 




16 


108.6 


O . 


1 
1 


854 


24 


ABQ40657 


Oligonucleotide fo 




17 


104.6 


o 

O . 


U 


688 


24 


ABT09813 


K+beta M6 related 


c 


18 


95.8 


z • 


Q 

o 


1757 


24 


ABQ13666 


Oligonucleotide fo 




19 


95.8 


z . 


Q 
0 


1757 


24 


ABQ13667 


Oligonucleotide fo 


c 


20 


80 


9 

Z . 


O 


80 


24 


AAD46069 


Antisense oligonuc 


c 


21 


79 


Z . 


o 


425 


22 


AAS60450 


Human cancer agent 




22 


79 


Z . 


o 
o 


1119 


21 


AAC60033 


Human secreted pro 




23 


79 


Z • 


>3 


1492 


21 


AAC98102 


Human colon cancer 




24 


79 


z . 


Q 
3 


1493 


22 


AAH34433 


Human colon cancer 




25 


79 


9 

Z • 


Q 


1493 


24 


ABL90331 


Human polynucleoti 


c 


26 


76.8 


Z . 


9 

z 


2796 


24 


ABL90605 


Human polynucleoti 




27 


76.2 


Z • 


9 
Z 


847 


23 


ABL06735 


Drosophila melanog 


c 


28 


76.2 


9 

^ • 


9 


2847 


23 


ABL06734 


Drosophila melanog 




29 


75.2 


9 

Z . 


9 
Z 


1856 


23 


ABK43528 


DNA encoding novel 


c 


30 


75 


9 

Z • 


9 
Z 


442 


24 


ABL94107 


Arabidopsis thalia 


c 


31 


74.2 


9 

Z • 


1 
± 


655 


22 


AAH70113 


Human cervical can 




32 


74.2 


9 
. z • 


1 
X 


887 


21 


AAC59297 


Human secreted pro 




33 


74 


2 . 


1 


664 


21 


AAA26336 


Human secreted pro 




34 


73.8 


2. 


1 


1091 


22 


AAC89723 


Maize ZmGnsNl-1 gl 




35 


73.8 


2. 


1 


1091 


25 


ABX95035 


cDNA encoding maiz 




36 


73.8 


2. 


1 


1992 


22 


AAF72748 


Human prostate can 




37 


73.6 


2. 


1 


1204 


21 


AAC59836 


Human secreted pro 


c 


38 


73.4 


2. 


1 


375 


23 


ABV44911 


Human prostate exp 




39 


73.4 


2. 


1 


2440 


22 


AAH34932 


Human colon cancer 


c 


40 


73.4 


2. 


1 


4055 


22 


AAI58815 


Human polynucleoti 




41 


73.2 


2. 


1 


2377 


21 


AAC96941 


Human secreted pro 


c 


42 


73 


2. 


1 


348 


22 


AAL10133 


Human breast cance 




43 


73 


2. 


1 


1814 


25 


ABT17358 


Human SLC7 related 




44 


72.6 


2. 


1 


346 


23 


ABV4 898 8 


Human prostate exp 




45 


72.2 


2. 


1 


297 


22 


AAS29114 


cDNA encoding for 



ALIGNMENTS 



RESULT 1 
AAD46068 

ID AAD46068 standard; cDNA; 3468 BP. 
XX 

AC AAD46068; 
XX 

DT 27-DEC-2002 (first entry) 
XX 

DE Human K+betaM2 cDNA. 
XX 

KW Human; potassium channel beta-subunit ; K+betaM2 protein; neural disorder; 

KW reproductive disorder; metabolic disorder; premature puberty; nephritis; 

KW endocrine disorder; memory disorder; neuroendocrine condition; asthma; 

KW spermatogenesis; renal disease; learning deficiency; Alzheimer's disease; 

KW neurodegenerative disease; proliferative disorder; autoimmune disease; 

KW carcinoid tumour; blood coagulation disease; blood platelet disease; 

KW rheumatoid arthritis; allergy; hyperprolif erative disease; gene therapy; 

KW graf t-versus-host disease; organ rejection; antisterility; thrombolytic; 

KW antiinflammatory; neuroprotective; anti-Parkinsonian; immunosuppressive; 

KW nephrotropic; cytostatic; nootropic; hypotensive; vulnerary; gene; ss. 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT CDS 515.. 1801 

FT /*tag= a 

FT /product^ "Human K+betaM2 protein" 
XX 

PN WO200266601-A2. 
XX 

PD 29-AUG-2002. 
XX 

PF 24-JAN-2002; 2002WO-US02332 . 
XX 

PR 24-JAN-2001; 2001US-263872P . 

PR 14-FEB-2001; 2001US-269794P . 
XX 

PA (BRIM ) BRISTOL-MYERS SQUIBB CO. 
XX 

PI Feder J, Lee L, Chen J, Jackson D, Ramanathan C, Siemers N; 
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PT New potassium channel beta-subunit , K+betaM2, proteins and nucleic 

PT acids , useful for diagnosing, treating and/or preventing e.g. 

PT reproductive, neural, metabolic, endocrine, memory, neurodegenerative 

PT disorders or diseases 

XX 

PS Claim 1; Page 344-347; 366pp; English. 
XX 

CC The present invention relates to human potassium channel beta-subunit 

CC (K+betaM2) proteins and polynucleotides encoding such proteins. The 

CC K+betaM2 sequences are useful for diagnosing, treating and/or preventing 

CC reproductive disorders, neural disorders, disorders related to aberrant 



CC potassium regulation or hyper potassium channel activity, metabolic 

CC disorders (e.g. premature puberty), endocrine disorders (e.g. aberrant 

CC growth hormone synthesis and/or secretion) , memory disorder, disorders 

CC of the testis (e.g. spermatogenesis), neuroendocrine condition related 

CC to aberrant thyroid hormone release, renal disease or disorders (e.g. 

CC nephritis), disorders related to aberrant higher brain function (e.g. 

CC learning deficiencies), neurodegenerative diseases (e.g. Alzheimer's 

CC disease), proliferative disorders (e.g. carcinoid tumour) and disorders 

CC involving excessive smooth muscle tone or excitability (e.g. asthma). 

CC They may be used to modulate haemostatic or thrombolytic activity, to 

CC treat or prevent blood coagulation diseases or disorders, blood platelet 

CC diseases, wounds, autoimmune diseases, disorders or conditions (e.g. 

CC rheumatoid arthritis), allergic reactions (e.g. asthma), organ rejection 

CC or graf t-versus-host disease, and hyperprolif erative diseases. K+betaM2 

CC sequences are also used in gene therapy. The present sequence is human 

CC K+betaM2 cDNA. 
XX 

SQ Sequence 3468 BP; 1038 A; 728 C; 703 G; 999 T; 0 other; 



Query Match 100.0%; Score 3468; DB 24; Length 3468; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 3468; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 CAAGC ACT GT GCT AAAGT GT TT TT C AT ATGT CAT GAAAAGTT GT GCCAGAAAATT AT GGT 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 CAAGCACT GT GCT AAAGT GT TT TT C AT AT GT C AT GAAAAGTT GT GCCAGAAAATT AT GGT 60 

Qy 61 TTGAACAT GGGCAGTTTT CT CCTACCGTCAGCTATATC CACAAGCATCACAT GAAGT GGA 120 

I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 61 TT GAAC AT GGGC AGTT T T CT CCT ACC GT CAGCT AT AT C CACAAGCAT CACAT GAAGT GGA 120 

Qy 121 GATCTGGCAGCTCT GT GTATTT CAGT CAAGTTCCACAATGAAACCTGACAATAAT GGTAA 180 

I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 GATCT GGC AGCT CT GT GT AT TT C AGT CAAGTT C CACAAT GAAAC CT GACAAT AAT GGTAA 180 

Qy 181 AAACCAATACGGACATCT GAGTAACTGGGGAATTGGCCTGCCTT GCAT GTGAGCTT GATG 240 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 AAACCAATACGGACATCTGAGTAACTGGGGAATTGGCCTGCCTTGCATGTGAGCTTGATG 24 0 

Qy 241 GAAGATT GGAT ATAGAC GAGT T GATT AT AT TT TAT GAAGT AGCAGCT C ACT AC CAT C C AC 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 GAAGATTGGATATAGACGAGTTGATTATATTTTATGAAGTAGCAGCTCACTACCATCCAC 300 

Qy 301 CAT CCAGGGT TTAAAC TACT TT T T CAGC AT CACT TC AC CT GT GGACT CTT AT ACAT T TT G 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 301 CATC CAGGGT TTAAACT ACT TT T T CAGC AT CACT T CAC CT GT GGACT CT T AT AC AT TT T G 360 

Qy 361 ATTTCTTGGGGGAAAAATACTGGGATAAGAGGAGGTCATTTTTTAATAAGTTAGCATCCT 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 361 AT TT CTT GGGGGAAAAAT ACT GGGATAAGAGGAGGT C ATT T T T T AAT AAGTT AGC ATC CT 420 

Qy 421 TTTCCCTTTCTTACAAGTTGATCCAAAGGATAAGGCTGTGACTCCATTGGATTGCACCTT 4 80 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 TT T C CCTT T CTT ACAAGT T GAT C CAAAGGAT AAGGCT GT GACT C CATT GGATT GC ACCTT 4 80 



Qy 



481 TAAATCAAAATAGCAGCAGCAGAAGAAAGGGACAAT GGCTCT GAGT GGAAACTGT AGT CG 540 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 



Db 481 T AAAT CAAAAT AGC AG C AGC AGAAGAAAGGGACAAT GGCT CT GAGT GGAAACT GT AGT C G 540 

Qy 541 TT ATT AT C CT C GAGAACAAGGGT C C GCAGTT C C CAACT C CTT C C CT GAGGTGGT AGAGCT 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 541 TT ATT AT C CT C GAGAACAAGGGT C C GCAGTT CC CAACT C CTT C C CT GAGGT GGT AGAGCT 600 

Qy 601 GAATGTCGGGGGTCAAGTTTATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTC 660 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I II I I 
Db 601 GAATGTCGGGGGTCAAGTTTATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTC 660 

Qy 661 C CT C CT GT GGAAAAT GTT TTCC CCAAAGAGAGAC AC GGCT AAT GAT CT AGCCAAGGACT C 720 

I I I I I I I I I I I I I I I I I II I I I I I I I II I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 661 C CT C CT GT GGAAAAT GTT TT CC CCAAAGAGAGAC AC GGCTAAT GAT CT AGC CAAGGACT C 720 

Qy 721 CAAGGGAAGGT TT TT C AT T GAC AGAGAT GGATT CTT GTT C C GT TAT AT T CTGGACT AT CT 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 721 CAAGGGAAGGTTTTT CATT GACAGAGATGGATTCTT GTTCCGTTATATT CTGGACTAT CT 780 

Qy 781 CAGGGACAGGCAGGT GGTCCT GCCT GAT CACTTTCCAGAAAAAGGAAGACTGAAAAGGGA 840 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 781 CAGGGACAGGCAGGT GGT C CT GCCT GAT CACTT T CC AGAAAAAGGAAGACT GAAAAGGGA 840 

Qy 841 AGCT GAAT ACT T C CAGCT CCCAGACTT GGT CAAACT C CT GAC C C C C GAT GAAATCAAGCA 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I 
Db 841 AGCT GAAT ACT T C CAGCT C CCAGACTT GGT CAAACT C CT GAC C CC C GAT GAAATCAAGCA 900 

Qy 901 AAGC C C AGAT GAATT CT GC C AC AGT GACTTT GAAGAT GC CT CC CAAGGAAGC GAC ACAAG 960 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 901 AAGC C C AGAT GAATT CTGC CACAGT GACTT T GAAGAT GC CT C C CAAGGAAGC GAC ACAAG 960 

Qy 961 AATCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGG 1020 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 

Db 961 AATCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGG 1020 

Qy 1021 T T AC AGAGGAT CCT GC AC CTT GGGCAGAGAGGGACAGGCAGAT GC CAAGTTT C GGAGAGT 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1021 T T AC AGAGGAT C CT GCAC CTT GGGCAGAGAGGGACAGGCAGAT GCCAAGTTT C GGAGAGT 1080 

Qy 1081 TCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTT 1140 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1081 TCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTT 114 0 

Qy 1141 GAAT GAAAGCAGAGACCCTGAT CGAGCCCCAGAAAGAT ACACCT CCAGATTTT ATCTCAA 1200 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1141 GAAT GAAAGCAGAGACCCTGAT CGAGCCCCAGAAAGAT ACACCTCCAGATTTT AT CTCAA 1200 

Qy 1201 ATT CAAGC AC CT GGAAAGGGCTT TT GAT AT GT T GT C AGAGT GT G GAT T C CACATGGTGGC 1260 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1201 ATT CAAGC AC CT GGAAAGGGCTTT T GAT AT GTT GT C AGAGT GT GGAT T C C AC AT GGTGGC 1260 

Qy 1261 CT GT AACT CAT C GGT GACAGCAT CTTT CAT CAACCAAT AT ACAGAT GACAAGAT CT GGT C 1320 

I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1261 CT GT AACT C AT CGGT GAC AGCAT CTTT CAT CAAC CAAT AT AC AGAT GACAAGAT CT GGT C 1320 

Qy 1321 AAGCT AC ACT GAAT AT GT CTT CTACCGT GAGC CT TC C AGAT GGT C AC CCT CAC ACT GC GA 1380 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 1321 AAGCT AC ACT GAAT AT GT CTT CT AC C GT GAGC CTT C CAGAT GGT CAC CCT C ACACT GCGA 1380 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1381 TTGCT GCTGCAAGAAT GGCAAAGGT GACAAAGAAGGGGAGAGCGGCACGT CTT GCAAT GA 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1381 TTGCTGCTGCAAGAATGGCAAAGGTGACAAAGAAGGGGAGAGCGGCACGTCTTGCAATGA 1440 

1441 CCTCTCCACATCTAGCTGCGACAGCCAGTCTGAGGCCAGCTCTCCCCAGGAGACGGTCAT 1500 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1441 C CTCT CC AC AT CT AGCT GC GAC AGC CAGT CT GAGGC CAGCT C TC C C CAGGAGAC GGT CAT 1500 

1501 CT GT GGT CC C GT GAC AC GC C AGAC CAACAT C CAGACT CT GGACC GT C CC AT CAAGAAGGG 1560 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1501 CT GT GGT CC C GT GAC AC GC C AGAC CAACAT C CAGACT CT GGACC GT C CCAT CAAGAAGGG 1560 

1561 C C CT GT C CAG CT GAT C CAAC AGT C AGAGAT GC GGC GGAAAAG CGACTT ACT C CGGATT CT 1620 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
1561 C C CT GT C CAGCT GAT C CAAC AGT C AGAGAT GCGGC GGAAAAG CGACTT ACT C CGGATT CT 1620 

1621 GACT T C AGGCT C CAGGGAAT C GAAC AT GAGC AGCAAAAAAAAAGCT GTTAAAGAAAAGCT 1680 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1621 GACT T C AGGCT C CAGGGAAT C GAAC AT GAGC AGCAAAAAAAAAGCT GTTAAAGAAAAGCT 1680 

1681 CT CAAT T GAGGAGGAGCT GGAGAAAT GT AT C C AGGATTT CCTAAAAAAAAAAATT C CAGA 174 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1681 CT CAAT T GAGGAGGAGCT GGAGAAAT GT AT C CAG GATTT C CTAAAAAAAAAAATT C CAGA 1740 

1741 T C GGT T T C CT GAGAGAAAAC AT C CT T GGC AAT CT GAACT TT T AAGGAAGT AT C AT CT AT A 1800 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I 
1741 TCGGTTTCCT GAGAGAAAAC AT CCTTGGCAAT CT GAACT TTT AAGGAAGT AT CATCTATA 1800 

1801 AGGGAGGGCTGGGGGCGGGGAAAAAAAAAAAAAAGAGTCATTTTGAAATTAACCTCATAA 1860 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
1801 AGGGAGGGCTGGGGGCGGGGAAAAAAAAAAAAAAGAGTCATTTTGAAATTAACCTCATAA 1860 

1861 AAGGAAT T CAT ATT T T AAAGGAAAAAAAT ACAACT AATGAT GC ACATTT CT T AGAAC AC A 192 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
1861 AAGGAATTCATATTTTAAAGGAAAAAAATACAACTAATGATGCACATTTCTTAGAACACA 1920 

1921 AT AGT CCAT T GATAT ACT ACT GCCT ACTTT AC CT AGT TC AC CTTAAC AT GT AAAT C C AC A 1980 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1921 AT AGT C CAT T GATAT ACT ACT GCCT ACTTT ACCT AGT TC AC CTTAAC AT GT AAAT C C AC A 1980 

1981 GGGT AGATT T CT TT CTAGAT GT GGAAGTACAAGAAAATCTT T TTTAGTT AT TT GTT T GT T 2 04 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1981 GGGT AGATT T CT TT CTAGAT GT GGAAGT ACAAGAAAAT CT T T TTT AGTT ATTT GTTT GTT 204 0 

2041 T ACTT C GTC C CATGT GCTAACT AT CT TAT AT AT AAT GAGAGC CAGCT ACGTAAAAGT AGC 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2041 TACT T C GT C C CATGT GCTAACT AT CT TAT AT AT AAT GAGAGC CAG CT AC GTAAAAGT AGC 2100 

2101 TGAGAGGCCTTGGGAGTCATTTATCCCAAACTGGGTTTTTTCTCT CATC CTT CTACCTCC 2160 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2101 TGAGAGGCCTTGGGAGTCATTTATCCCAAACTGGGTTTTTTCTCTCATCCTT CTACCTCC 2160 

2161 CT C CTTT GAAT GAGGGT AT GGT AGAAAAAGAT CT GGC CCAAT GGCATAAGTT T GGAAT T T 2220 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2161 CT CCTT T GAAT GAGGGT AT GGT AGAAAAAGAT CT GGC CCAAT GGC ATAAGTT TGGAATT T 2220 



Qy 2221 TTAATTTTGGTTTTTCCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTT 2280 

I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2221 TTAATTTTGGTTTTTCCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTT 2280 

Qy 2281 T T C ACT CAAAT CT AT AT GT GC C AGT T TAT AT T G ACT C C GT AT G CAT GAGT AT T T GT GC AA 2340 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I 

Db 2281 T T C ACT CAAAT CT AT AT GT G C C AGT T TAT AT T G AC T C C GT AT G CAT GAGT AT T T GT G C AA 2340 

Qy 2341 CACAAGCACAACT AAGT AT GT AT AT ACACAT GAC G C AC AC GAT GC CAGGGC CT AGACCT C 2400 

II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2341 CACAAGCACAACT AAGT AT GT AT AT ACACAT GAC GCAC AC GAT GC CAGGGC CT AGACCT C 2400 

Qy 2401 CCAAGGGCTGTGCTCCTGCTCCCAGCAGCCCTCTCTTAGAATATTTCAGATGGATGAGCT 2460 

I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2401 CCAAGGGCTGTGCTCCTGCTCCCAGCAGCCCTCTCTTAGAATATTTCAGATGGATGAGCT 2460 

Qy 2461 TCTGACTCTTTCTTAAAATTCTTTTGGGAAGATTTCCCAGCCTTTCTTCACAACACTTTC 2520 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2461 TCTGACTCTTTCTTAAAATTCTTTTGGGAAGATTTCCCAGCCTTTCTTCACAACACTTTC 2520 

Qy 2521 T AAC AT CAAAT GAC T C T CAT CAT C AAC AAAT T GT AT T C C T TAT T GT G AAAT T AAT AC C C T 2580 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 2521 T AAC AT CAAAT GACT CTCATC AT CAACAAATTGT AT T C CT T AT T GT GAAAT T AAT ACCCT 2580 

Qy 2581 CAGGCTCCATTTTACTGCTTTGCTCTTTGTCTGCATTAAGAGAGGATGAGGAGAGCTGGT 2640 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 2581 CAGGCT CCATTTTACTGCTTT GCT CTTT GT CTGCATTAAGAGAGGATGAGGAGAGCTGGT 2640 

Qy 2641 CAAACAT T CCTT GT GTT AAAAAAAT CAAAC ATT CAT AT C CACAAAATTTT CT GCT AAAT G 2700 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2641 CAAACAT T C CT T GT GT T AAAAAAAT CAAAC ATT CAT AT C CACAAAAT TT T CT GCT AAAT G 2700 

Qy 2701 ACTCCACACTCAGCCTTCTCTACCCTGAACTGAATTATCACCCTTTTCTCCATGTTTTCA 2760 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2701 ACTCCACACTCAGCCTTCTCTACCCTGT^ACTGAATTATCACCCTTTTCTCCATGTTTTCA 2760 

Qy 2761 GAGTT CTT ACT GCCCAC AGTTTAAT GGT GT GGCCT T T C CACATAAT C C ACAT T AAGTT CT 2820 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2761 GAGT T CT T ACT GC CCAC AGTT T AAT GGT GT GGC CT T T C CACATAAT C C ACAT T AAGTT CT 2820 

Qy 2821 GT GT T C CT GT GTT GT T GT GGAACT AAG GACAACACAC AGT ACT T GAAT AAGGGTC C GGC C 2 88 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I 
Db 2821 GT GT T C CT GT GT T GTT GT GGAACTAAGGACAAC ACAC AGT ACT T GAAT AAGGGTC C GGCC 2880 

Qy 2881 TT T T GTTT GT TT T AGAGAAAGTT GT AT T CCAC AC ACAAC CTAATAATTT CTT ATAAAAAT 294 0 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I 
Db 2881 T T T T GT T T GT TTT AGAGAAAGT T GT ATT C C ACAC ACAAC CT AAT AAT T T C TT ATAAAAAT 294 0 

Qy 2941 TTTAAACTACAAAGCTACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATT 3000 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 2941 TTTAAACTACAAAGCTACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATT 3000 

Qy 3001 CGGGCTTTGGCTGTGCCCATGCTAGGATTTAGCTGTGTCATTTTTATGATGTCTGTAACA 3060 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

Db 3001 CGGGCTTTGGCTGTGCCCATGCTAGGATTTAGCTGTGTCATTTTTATGATGTCTGTAACA 3060 

Qy 3061 ACCCAACAAGGTAACT GAAGCT CC AGAGTTAAGGT T T C AGATT T CT AAAT GAAACT AT CT 3120 



Db 



3061 



3120 



Qy 3121 TTTT CAATT AC AT C CT GACTT GT AT AGAC AC AGC C AAAAAGAAACT GT TAAT AGC C AT CC 3180 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
Db 3121 TTTTCAATT ACAT CCTGACTTGTATAGACACAGCCAAAAAGAAACT GTTAAT AGCCAT CC 3180 

Qy 3181 GTCCATGTAACTCTGTATTTTACTAAGGTACCAATAGCTCTTTCATAGACTTGTGCTACA 3240 

I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
Db 3181 GTCCATGTAACTCTGTATTTTACTAAGGTACCAATAGCTCTTTCATAGACTTGTGCTACA 3240 

Qy 3241 AGAAGGTTAAAAGACCAGTTTTATTTTCAGCATTCCTCATGCATTTCAGTGGTAACCAAA 3300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3241 AGAAGGT T AAAAGAC CAGT TT T ATTT T CAGC AT T C CT CAT GC AT TT CAGT GGT AACCAAA 3300 

Qy 3301 AAATAATTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTG 3360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3301 AAATAATTTGTCAATTT^ATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTG '3360 

Qy 3361 T GT GC AT GTGT GT AT GT GT AT C ACAGGTAAT AAAGGCAAT T GGAT GAT TAAAAAAAAAAA 3420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3361 T GT GC AT GT GT GT AT GTGT AT CACAGGTAAT AAAGGCAAT T GGATGATTAAAAAAAAAAA 3420 

Qy 3421 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 3421 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 



RESULT 2 


ABN59764 


ID 


ABN59764 standard; cDNA; 2412 BP. 


XX 




AC 


ABN59764; 


XX 




DT 


28-JUN-2002 (first entry) 


XX 




DE 


Novel human coding sequence SEQ ID NO: 175. 


XX 




KW 


Human; antianaemic; vulnerary; antiinflammatory; immunomodulator ; 


KW 


antiinfertility; cerebroprotective; cytostatic; rheumatic; gene therapy; 


KW 


neuroprotective; antiparkinsonian; protein therapy; EST; 


KW 


expressed sequence tag; gene; ss. 


XX 




OS 


Homo sapiens. 


XX 




PN 


WO200222660-A2. 


XX 




PD 


21-MAR-2002. 


XX 




PF 


10-SEP-2001; 2001WO-US26015. 


XX 




PR 


ll-SEP-2000; 2000US-0659671. 


XX 




PA 


(HYSE-) HYSEQ INC. 


XX 




PI 


Tang YT, Liu C, Zhou P, Asundi V, Zhang J, Zhao QA, Ren F; 


PI 


Xue AJ, Yang Y, Wehrman T, Drmanac RT; 



XX 

DR WPI; 2002-292408/33. 

DR P-PSDB; ABB97351. 
XX 

PT An isolated polynucleotide for treating diseases associated with its 

PT encoded polypeptide such as cancer and multiple sclerosis - 

XX 

PS Claim 1; SEQ ID NO 175; 509pp; English. 
XX 

CC The present invention provides the protein and coding sequences of 444 

CC novel human proteins. These were isolated from expressed sequences tags 

CC (ESTs) . They can be used to stimulate cell growth, to regulate 

CC haematopoiesis e.g. to treat aplastic anaemia, to help tissue regrowth 

CC e.g. in burn treatment, to regulate the immune system e.g. to treat 

CC multiple sclerosis, to regulate activin or inhibin e.g. to treat 

CC infertility, to regulate haemostasis or thrombolysis e.g. to treat 

CC stroke and cancer, to screen for drugs, to treat inflammatory conditions 

CC e.g. rheumatoid arthritis, and to treat nervous system disorders e.g. 

CC Parkinson's disease. The present sequence is a coding sequence of the 

CC invention. 



XX 

SQ Sequence 2412 BP; 638 A; 585 C; 551 G; 638 T; 0 other; 

Query Match 47.3%; Score 1640.8; DB 24; Length 2412; 

Best Local Similarity 99.6%; Pred. No. 0; 

Matches 1645; Conservative 0; Mismatches 7; Indels 0; Gaps 0; 

Qy 183 ACCAATACGGACATCTGAGTAACTGGGGAATTGGCCTGCCTTGCATGTGAGCTTGATGGA 242 

II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I 
Db 7 60 ACTAAGACGGACATCTGAGTAACTGGGGAATTGGCCTGCCTTGCATGTGAGCTTGATGGA 819 

Qy 243 AGAT T GGAT AT AGAC GAGT T GATT AT ATT TT AT GAAGT AGC AGCT CACT ACC AT CC AC CA 302 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 820 AGATT GGAT AT AGAC GAGT T GATT AT ATTTT AT GAAGT AGC AGCT CACT AC CAT CC ACCA 87 9 

Qy 303 T C CAGG GT T TAAACT ACT T T TT CAGC AT CACTT C ACCT GT GGACT CTTAT AC AT TT T GAT 362 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 880 T CCAGGGTT TAAACT ACT TT TT CAGCAT CACTT C AC CT GT GGACT CT TAT AC ATTTTGAT 939 

Qy 363 TTCTTGGGGGAAAAATACTGGGATAAGAGGAGGTCATTTTTTAATAAGTTAGCATCCTTT 422 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 940 TTCTTGGGGGAAAAATACTGGGATAAGAGGAGGTCATTTTTTAATAAGTTAGCATCCTTT 999 

Qy 423 T C CCTT T CT T ACAAGT T GAT CCAAAGGATAAGGCT GT GACT CC ATT GGATT GC ACCT T T A 4 82 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1000 T C CCTT T CT TACAAGTT GAT CCAAAGGATAAGGCT GT GACT CC ATT G GAT T GCACCTT T A 1059 

Qy 483 AAT CAAAAT AGC AGC AGC AGAAGAAAGGGACAAT G GCT CT GAGT GGAAACT GT AGT C GT T 542 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1060 AAT CAAAAT AGC AGC AGC AGAAGAAAGGGACAAT GGCT CT GAGT GGAAACT GT AGT CGT T 1119 

Qy 543 ATTATCCTCGAGAACAAGGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGA 602 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1120 ATTATCCTCGAGAACAAGGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGA 1179 

Qy 603 ATGTCGGGGGTCAAGTTTATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCC 662 

I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 



Db 1180 ATGTCGGGGGTCAAGTTTATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCC 1239 

Qy 663 TC CTGT GGAAAAT GTT TT C CC CAAAGAGAGACACGGCT AAT GAT CT AGCCAAG GACT CCA 722 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1240 TCCTGT GGAAAAT GTTTT CCCCAAAGAGAGACACGGCTAATGAT CT AGCCAAGGACTCCA 1299 

Qy 723 AGGGAAGGTTTTT CAT T GACAGAGATG GAT T CTT GT T C C GT TAT AT T CT GGACT AT CT C A 782 

I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 1300 AGGGAAGGTTT TT CATT GACAGAGAT G GAT TCTTGTTCCGT TAT ATT CT GGACT ATCT C A 1359 

Qy 783 GGGAC AGGCAGGT GGT C CT GC CTGAT C ACT TT C CAGAAAAAGGAAGACT GAAAAGGGAAG 842 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1360 GGGACAGGCAGGT GGT CCT GCCTGATCACTTTCCAGAAAAAGGAAGACT GAAAAGGGAAG 1419 

Qy 843 CT GAAT ACTT C C AGCT C C CAGACTT GGT CAAACTC CT GACC CC C GAT GAAAT CAAGCAAA 902 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1420 CT GAAT ACTT C C AGCT C C CAGACTT GGT CAAACTC CT GACC CC CGAT GAAAT CAAGCAAA 14 79 

Qy 903 GC C C AGAT GAAT T CT GC C AC AGTGACTT T GAAGAT GC CT CC CAAGGAAGC GAC ACAAGAA 962 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1480 GCCCAGAT GAATTCTGCCACAGTGACTTT GAAGAT GCCTCCCAAGGAAGCGACACAAGAA 1539 

Qy 963 TCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTT 1022 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 1540 TCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTT 1599 

Qy 1023 ACAGAGGAT C CTGC AC CT T GGGCAGAGAGGGACAGGC AGAT GC CAAGTTT C GGAGAGTTC 1082 

I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 1600 ACAGAGGAT C CTGC AC CT T GGGCAGAGAG GGAC AGGC AGAT GC CAAGTTT C GGAGAGTTC 1659 

Qy 1083 CC C GGAT T TT GGT T T GT GGAAGGAT TT C CT T GGCAAAAGAAGT CTTT GGAGAAACTTT GA 1142 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1660 CCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTTGA 1719 

Qy 1143 AT GAAAGCAGAGACCCT GATCGAGCCCCAGAAAGAT ACACCTCCAGATTTTATCT CAAAT 1202 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1720 AT GAAAGC AGAGAC CCT GAT C GAGC C C C AGAAAGAT ACAC CTC CAGATTT T AT CT CAAAT 1779 

Qy 1203 T CAAGCAC CT GGAAAGGGCTT T TGAT AT GT T GT CAGAGT GT G GATT C CACAT GGT GGC CT 1262 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1780 T CAAGCAC CT GGAAAGGGCTT T TGAT AT GT TGT CAGAGT GT GGATT C CACAT GGT GGC CT 1839 

Qy 1263 GTAACT CATCGGT GACAGCAT CTTT CAT CAACC AAT AT AC AGAT GACAAGAT CT GGTCAA 1322 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1840 GTAACT CAT CG GT GACAGCAT C TT T CAT CAACCAAT AT AC AGAT GACAAGAT CT GGT CAA 1899 

Qy 1323 GCTACACTGAATATGTCTTCTACCGTGAGCCTTCCAGATGGTCACCCTCACACTGCGATT 1382 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1900 GCTACACT GAAT AT GT CTT CT ACCGTGAGCCTTCCAGATGGTCACCCTCACACT GCGATT 1959 

Qy 1383 GCT GCT GCAAGAAT GGCAAAGGT GACAAAGAAGGGGAGAGC GGC AC GT CT T GCAATGACC 1442 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1960 GCT GCT GCAAGAAT GGCAAAGGT GACAAAGAAGGGGAGAGCGGC AC GT CT T GCAATGACC 2019 

Qy 1443 TCTCCACATCTAGCTGCGACAGCCAGTCTGAGGCCAGCTCTCCCCAGGAGACGGTCATCT 1502 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2020 TCTCCACATCTAGCTGCGACAGCCAGTCTGAGGCCAGCTCTCCCCAGGAGACGGTCATCT 2079 



Qy 1503 
Db 2080 



GT GGT C C C GT GAC AC GC CAGAC CAACAT CCAGACT CT GGAC CGTCC C AT CAAGAAGGGC C 
I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I II I I I I I I I I 
GT GGT C CC GT GACACGC CAGAC CAACAT C CAGACT CTGGAC C GT C C CAT CAAGAAGGGC C 



1562 
2139 



Qy 



1563 CTGTCCAGCTGATCCAACAGTCAGAGATGCGGCGGAAAAGCGACTTACTCCGGATTCTGA 1622 




Db 



2140 CT GT C C AGCT GAT C CAACAGT C AGAGAT GCGGC GGAAAAGC GACT T ACTC CGGACT CT GA 2199 



Qy 



Db 



1623 CT T CAGGCT C CAGGGAATC GAACATGAGC AGCAAAAAAAAAGCT GTTAAAGAAAAGCT CT 1682 

I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
2200 CTT CAGGCT CCAGGGAATCGAACATGAGCAGCAAAAAAAAAGCT GTTAAAGAAAAGCT CT 2259 



Qy 



1683 CAAT T GAG GAGGAGCTGGAGAAAT GT AT C CAGGATT T C CT AAAAAAAAAAAT T C C AGAT C 1742 




Db 



2260 CAATT GAGGAGGAGCTGGAGAAAT GTATCCAGGATTTCCTAAAAAT CAAAATTCCAGAT C 2319 



Qy 



1743 GGTTT CCT GAGAGAAAACATCCTT GGCAATCTGAACTTTTAAGGAAGTAT CATCTATAAG 1802 




Db 



2320 GGT TT C CT GAGAGAAAACAT C CTT GGCAAT CT GAACTT T T AAGGAAGT AT CAT CT AT AAG 2379 



Qy 



1803 GGAGGGCTGGGGGCGGGGAAAAAAAAAAAAAA 1834 




Db 



2380 GGAGGGCT GGGGGC GGGAAAAGAAAAAAAAAA 2411 



RESULT 3 
AAD46125 

ID AAD46125 standard; DNA; 769 BP. 
XX 

AC AAD46125; 
XX 

DT 27-DEC-2002 (first entry) 
XX 

DE Human BAC AC008652 exon used to isolate K+betaM2 cDNA. 
XX 

KW Human; potassium channel beta-subunit ; K+betaM2 protein; neural disorder; 

KW reproductive disorder; metabolic disorder; premature puberty; nephritis; 

KW endocrine disorder; memory disorder; neuroendocrine condition; asthma; 

KW spermatogenesis; renal disease; learning deficiency; Alzheimer's disease; 

KW neurodegenerative disease; proliferative disorder; autoimmune disease; 

KW carcinoid tumour; blood coagulation disease; blood platelet disease; 

KW rheumatoid arthritis; allergy; hyperprolif erative disease; gene therapy; 

KW graf t-versus-host disease; organ rejection; antisterility ; thrombolytic; 

KW antiinflammatory; neuroprotective; anti-Parkinsonian; immunosuppressive; 

KW nephrotropic; cytostatic; nootropic; hypotensive; vulnerary; ds . 
XX 

OS Homo sapiens . 
XX 

PN WO200266601-A2. 
XX 

PD 29-AUG-2002. 
XX 

PF 24-JAN-2002; 2002WO-US02332 . 
XX 

PR 24-JAN-2001; 2001US-263872P . 

PR 14-FEB-2001; 2001US-269794P . 



PA (BRIM ) BRISTOL-MYERS SQUIBB CO. 
XX 

PI Feder J, Lee L, Chen J, Jackson D, Ramanathan C, Siemers N; 

PI Chang H, Carroll P; 

XX 

DR WPI; 2002-691617/74. 
XX 

PT New potassium channel beta-subunit, K+betaM2, proteins and nucleic 

PT acids, useful for diagnosing, treating and/or preventing e.g. 

PT reproductive, neural, metabolic, endocrine, memory, neurodegenerative 

PT disorders or diseases 

XX 

PS Example 1; Page 349-350; 366pp; English. 
XX 

CC The present invention relates to human potassium channel beta-subunit 

CC (K+betaM2) proteins and polynucleotides encoding such proteins. The 

CC K+betaM2 sequences are useful for diagnosing, treating and/or preventing 

CC reproductive disorders, neural disorders, disorders related to aberrant 

CC potassium regulation or hyper potassium channel activity, metabolic 

CC disorders (e.g. premature puberty), endocrine disorders (e.g. aberrant 

CC growth hormone synthesis and/or secretion) , memory disorder, disorders 

CC of the testis (e.g. spermatogenesis), neuroendocrine condition related 

CC to aberrant thyroid hormone release, renal disease or disorders (e.g. 

CC nephritis), disorders related to aberrant higher brain function (e.g. 

CC learning deficiencies), neurodegenerative diseases (e.g. Alzheimer 1 s 

CC disease), proliferative disorders (e.g. carcinoid tumour) and disorders 

CC involving excessive smooth muscle tone or excitability (e.g. asthma) . 

CC They may be used to modulate haemostatic or thrombolytic activity, to 

CC treat or prevent blood coagulation diseases or disorders, blood platelet 

CC diseases, wounds, autoimmune diseases, disorders or conditions (e.g. 

CC rheumatoid arthritis), allergic reactions (e.g. asthma), organ rejection 

CC or graf t-versus-host disease, and hyperprolif erative diseases. K+betaM2 

CC sequences are also used in gene therapy. The present sequence is human 

CC BAC AC008652 exon used to isolate K+betaM2 cDNA. This sequence is used 

CC in the exemplification of the invention. 
XX 

SQ Sequence 769 BP; 209 A; 180 C; 184 G; 196 T; 0 other; 



Query Match 22.2%; Score 769; DB 24; Length 769; 

Best Local Similarity 100.0%; Pred. No. 6.5e-143; 

Matches 7 69; Conservative 0; Mismatches 0; Indels 0; Gaps 0 



Qy 


393 


AGGTCATTTTTTAATAAGTTAGCATCCTTTTCCCTTTCTTACAAGTTGATCCAAAGGATA 


452 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 


AGGTCATTTTTTAATAAGTTAGCATCCTTTTCCCTTTCTTACAAGTTGATCCAAAGGATA 


60 


Qy 


453 


AGGCT GT GACT C CAT T GGATTGC ACCTTTAAAT CAAAAT AG CAGCAGCAGAAGAAAGGGA 


512 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 II 1 1 




Db 


61 


AGGCT GT GACT C CAT T GGATT GC AC CT TTAAAT CAAAAT AGCAGCAGCAGAAGAAAGGGA 


120 


Qy 


513 


CAATGGCT CT GAGTGGAAACTGT AGT CGTT ATT AT C CT C GAGAACAAGGGT C C GC AGT T C 


572 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 I I I I | | | I || 




Db 


121 


CAATGGCTCTGAGTGGAAACTGTAGTCGTTATTATCCTCGAGAACAAGGGTCCGCAGTTC 


180 


Qy 


573 


CCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTATTTTACTCGCC 


632 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 





Db 181 CCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTATTTTACTCGCC 240 

Qy 633 ATTCCACATTGATAAGCATCCCTCATTCCCTCCTGTGGAAAATGTTTTCCCCA/^AGAGAG 692 

I I I I 1 1 I I I I I I I I I I 1 1 I 1 1 1 1 1 I I I 1 1 I 1 1 I 1 1 I 1 1 1 1 II I I 1 1 I I I I I I 1 1 I 1 1 I i I 

Db 241 ATT CCACATT GATAAGCAT CCCTCATT CCCTCCTGTGGAAAAT GTTTTCCCCAAAGAGAG 300 

Qy 693 AC ACGG CT AAT GAT CT AGC CAAGGACT C CAAGGGAAGGTTTTT CAT T GAC AGAGAT G GAT 752 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 AC ACGGCT AAT GAT CT AGC CAAGGACT C CAAGGGAAGGTTTTT CAT T GAC AGAGAT GGAT 360 

Qy 753 TCTTGTTCCGTTATATTCTGGACTATCTCAGGGACAGGCAGGTGGTCCTGCCTGATCACT 812 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 TCTTGTTCCGTTATATTCTGGACTATCTCAGGGACAGGCAGGTGGTCCTGCCTGATCACT 420 

Qy 813 TT CCAGAAAAAGGAAGACT GAAAAGGGAAGCTGAAT ACTTCCAGCTCCCAGACTTGGT CA 872 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 421 TT CCAGAAAAAGGAAGACT GAAAAGGGAAGCTGAAT ACTTCCAGCTCCCAGACTTGGT CA 480 

Qy 873 AACT C CT GAC C CC C GAT GAAAT CAAGCAAAGCC C AGAT GAAT T CTGC CACAGTGACTT TG 932 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 481 AACT C CT GAC C CC C GAT GAAAT CAAGCAAAG C C CAGAT GAATT CTGC CAC AGT GACTT TG 540 

Qy 933 AAGATGCCTCCCAAGGAAGCGACACAAGAATCTGCCCCCCTTCCTCCCTGCTCCCTGCCG 992 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 541 AAGAT GC CTC C CAAGGAAGCGACACAAGAAT CTGCCCCCCTTCCTCCCTGCT C CCT GC CG 600 

Qy 993 ACCGCAAGTGGGGTTTCATTACTGTGGGTTACAGAGGATCCTGCACCTTGGGCAGAGAGG 1052 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
Db 601 ACCGCAAGTGGGGTTTCATTACTGTGGGTTACAGAGGATCCTGCACCTTGGGCAGAGAGG 660 

Qy 1053 GAC AGGC AGAT GC CAAGTTT CGGAGAGT T C C CC GGATT TT GGT T T GT GGAAGGATTT C CT 1112 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 661 GAC AG GC AGAT GC CAAGT TT CGGAGAGT T C C C CGGATT T T GGT T T GT GGAAGGAT TT C CT 720 

Qy 1113 T GGCAAAAGAAGT CT TT G GAGAAACT TT GAAT GAAAGC AGAGAC CCT GA 1161 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 

Db 721 T GGCAAAAGAAGT CTT T GGAGAAACTTT GAAT GAAAGC AGAGAC CCT GA 769 



RESULT 4 
ABA09216/C 

ID ABA09216 standard; cDNA; 906 BP. 
XX 

AC ABA09216; 
XX 

DT ll-JAN-2002 (first entry) 
XX 

DE Human VM106R.1 homologue-encoding cDNA, SEQ ID NO: 992. 
XX 

KW Human; cytokine; cell proliferation; cell differentiation; growth factor; 

KW haematopoiesis regulation; tissue growth; immunomodulator ; activin; 

KW inhibin; chemotaxis; chemokinesis ; thrombolysis; oncogenesis; 

KW proliferation; metastasis; cancer; tumour; haematopoietic disorder; 

KW myeloid cell disorder; lymphoid cell disorder; asthma; arthritis; 

KW chronic inflammatory condition; proliferative retinopathy; 

KW atherosclerosis; coronary heart disease; arterial ischaemia; 

KW bone disorder; osteoporosis; vascular growth disorder; 



KW tissue regeneration; wound healing; infection; immune disorder; 

KW cell culture; drug screening; gene therapy; antiinflammatory; 

KW antiasthmatic; antiarthritic; haemostatic; antiarteriosclerotic; 

KW cytostatic; osteopathic; vasotropic; cardiant; virucide; antibacterial; 

KW antifungal; vulnerary; antiulcer; ss. 

XX 

OS Homo sapiens. 
XX 

PN WO200157188-A2. 
XX 

PD 09-AUG-2001. 
XX 

PF 05-FEB-2001; 2001WO-US03800 . 
XX 

PR 03-FEB-2000; 2000US-0496914 . 

PR 27-APR-2000; 2000US-0560875 . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Tang YT, Liu C, Drmanac RT; 
XX 

DR WPI; 2001-457740/49. 

DR P-PSDB; ABB11972. 
XX 

PT Human proteins and DNA encoding sequences useful for preventing, 

PT treating or ameliorating a medical condition in a mammalian subject 

PT e.g. arthritis and cancer - 

XX 

PS Claim 1; Page 844-845; 1963pp; English. 
XX 

CC Sequences ABB10981-ABB12330 represent 1350 novel human polypeptides, and 

CC sequences ABA08225-ABA09574 represent nucleic acids encoding them. The 

CC invention also relates to vectors and recombinant host cells comprising a 

CC nucleotide of the invention, methods of producing the novel polypeptides, 

CC antibodies against the polypeptides, methods of detecting the nucleotides 

CC or polypeptides in a sample, and methods of identifying compounds which 

CC bind to polypeptides of the invention. Although novel, many of the 

CC polypeptides of the invention have homology to known proteins, thereby 

CC giving an insight into their probable biological activities, and hence 

CC potential therapeutic applications. The polypeptides of the invention may 

CC have various activities, including cytokine, cell proliferation or cell 

CC differentiation activities; stem cell growth factor activity; 

CC haematopoiesis regulatory activity; tissue growth activity; 

CC immunomodulatory activity; activin- or inhibin-related activities; 

CC chemotactic or chemokinetic activities; haemostatic, thrombotic or 

CC thrombolytic activities; receptor or ligand activities; or may be 

CC involved in oncogenesis, cancer cell proliferation or metastasis. 

CC Depending on their biological activities, polypeptides and nucleotides of 

CC the invention are useful for preventing, treating or ameliorating medical 

CC conditions, e.g., by protein or gene therapy. Such conditions include 

CC cancers, haematopoietic disorders (e . g . , myeloid or lymphoid cell 

CC disorders), chronic inflammatory conditions (e.g., asthma or arthritis), 

CC proliferative retinopathy, atherosclerosis, coronary heart disease, 

CC arterial ischaemia, bone disorders (e.g., osteoporosis), and abnormal 

CC vascular growth. Polypeptides involved with tissue regeneration and 

CC repair (or nucleic acids encoding them) may be used to promote wound 

CC healing (e.g., of burns, incisions and ulcers), while those with 



CC immunomodulatory activities may be used in the treatment of viral, 

CC bacterial and fungal infections in addition to immune disorders. 

CC Polypeptides with growth factor activity may be used in cell cultures to 

CC promote cell growth. For example, such polypeptides may be used to 

CC manipulate stem cells in culture to give rise to neuroepithelial cells 

CC that can be used to augment or replace cells damaged by illness, 

CC autoimmune disease or accidental damage. The polypeptides and nucleotides 

CC may also be used in the diagnosis of the above conditions, and in drug 

CC screening techniques. The present sequence represents a cDNA encoding a 

CC novel human polypeptide of the invention. 

XX 

SQ Sequence 906 BP; 220 A; 225 C; 216 G; 245 T; 0 other; 

Query Match 20.2%; Score 699.2; DB 22; Length 906; 

Best Local Similarity 98.9%; Pred. No. 4.6e-129; 

Matches 704; Conservative 0; Mismatches 8; Indels 0; Gaps 0; 

Qy 515 AT GGCT CT GAGT GGAAACT GT AGT C GT TAT T AT CCT C GAGAACAAGGGT C C GC AGTT CC C 574 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 906 ATGGCTCTGAGTGGAAACTGTAGTCGTTATTATCCTCGAGAACAAGGGTCCGCAGTTCCC 847 

Qy 575 AACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTATTTTACTCGCCAT 634 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 846 AACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTATTTTACTCGCCAT 787 

Qy 635 T C C AC ATT GATAAGC AT CC C T CAT T C C CT CCT GT GGAAAAT GTTTT C CC C AAAGAGAGAC 694 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 786 T C CAC ATT GAT AAGC AT CC CT C ATT C C CT CCT GT GGAAAAT GTT TT C CC CAAAGAGAGAC 727 

Qy 695 ACGGCTAAT GATCTAGCCAAGGACTCCAAGGGAAGGTTTTT CATTGACAGAGATGGATT C 754 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 726 ACGGCTAAT GAT CTAGCCAAGGACTCCAAGGGAAGGTTTTT CATT GACAGAGATGGATT C 667 

Qy 755 TTGTTCCGTTATATTCTGGACTATCTCAGGGACAGGCAGGTGGTCCTGCCTGATCACTTT 814 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 666 TTGTTCCGTTATATTCTGGACTATCTCAGGGACAGGCAGGTGGTCCTGCCTGATCACTTT 607 

Qy 815 CC AGAAAAAGGAAGACT GAAAAGGGAAGCT GAAT ACT TCCAGCT CC CAGACT TGGT CAAA 874 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 1 I I I I I I I I I I I I I I I I I I 

Db 606 CCAGAAAAAGGAAGACT GAAAAGGGAAGCT GAATACTTCCAGCTCCCAGACTTGGT CAAA 547 

Qy 875 CT CCT GACC C C C GAT GAAAT CAAGCAAAGC C C AGAT GAAT T CT GCC AC AGT GACTT T GAA 934 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 546 CT CCT GACC C C CGAT GAAAT CAAGCAAAGC C C AGAT GAATT CT GC CAC AGT GACTT T GAA 4 87 

Qy 935 GATGCCTCCCAAGGAAGCGACACAAGAATCTGCCCCCCTTCCTCCCTGCTCCCTGCCGAC 994 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 486 GATGCCTCCCAAGGAAGCGACACAAGAATCTGCCCCCCTTCCTCCCTGCTCCCTGCCGAC 427 

Qy 995 CGCAAGT GGGGTTTCATTACT GTGGGTT ACAGAGGAT CCT GCACCTT GGGCAGAGAGGGA 1054 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 426 CGCAAGT GGGGTTT CATT ACT GT GGGT T AC AGAGGAT CCT GCACCT T GGGCAGAGAGGGA 367 



Qy 1055 



Db 



366 



1114 
307 



Qy 1115 GCAAAAGAAGT CT T T GGAGAAACTTT GAAT GAAAG CAGAGAC C CT GAT C GAG C C C CAGAA 1174 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 306 GCAAAAGAAGT CTTTGGAGAAACTTT GAAT GAAAGCAGAGAC C CT GAT CGAGCC C CAGAA 247 

Qy 1175 AGAT ACAC CTC CAGAT TTT AT CTCAAATT CAAGC ACCT GGAAAGGGCTTT T G 1226 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I II 

Db 246 AGAT ACAC CTC CAGAT TTT AT CTCAAATT CAAGC ACCTAAT GGGGGCACCT G 195 



RESULT 5 
AAS34230 

ID AAS34230 standard; cDNA; 440 BP. 
XX 

AC AAS34230; 
XX 

DT 17-DEC-2001 (first entry) 
XX 

DE Human cDNA encoding a novel foetal antigen, SEQ ID No 754. 
XX 

KW Human; foetal tissue antigen; ss; antiinflammatory; neuroprotective; 

KW immunomodulator ; cardiovascular; cytostatic; nephrothropic; 

KW cardiovascular; autoimmune disease; rheumatoid arthritis; 

KW hyperprolif erative disorder; breast neoplasm; cancer; 

KW cardiovascular disorder; cardiac arrest; cerebrovascular disorder; 

KW cerebral ischaemia; angiogenesis ; nervous system disorder; 

KW Alzheimer's disease; infection; ocular disorder; corneal infection; 

KW wound healing; epithelial cell proliferation; food additive. 

XX 

OS Homo sapiens . 
XX 

PN WO200155312-A2. 
XX 
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2000, 
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PR 
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-JUL- 
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? 2000US- 
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PR 
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0218290. 


PR 
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0220963. 


PR 
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-JUL- 
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; 2000US- 


0220964. 


PR 
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0224518. 


PR 
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P 2000US- 
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PR 


14 


-AUG- 


2000, 
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0225213. 



PR 


14 
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2000; 


PR 


14 
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2000; 
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2000; 
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2000; 


PR 
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2000; 


PR 


01 


-SEP- 


2000; 


PR 
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2000; 


PR 
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2000; 


PR 


05 


-SEP- 


2000; 


PR 


05 
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PR 


06 


-SEP- 


2000; 


PR 
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PR 


08 
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08 
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PR 


08 
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2000; 
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14 
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-OCT- 
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PR 05-JAN-2001; 2001US-0259678 . 
XX 

PA (HUMA-) HUMAN GENOME SCI INC. 
XX 

PI Rosen CA, Barash SC, Ruben SM; 
XX 

DR WPI; 2001-488782/53. 

DR P-PSDB; AAU21410. 
XX 

PT New polynucleotides and polypeptides for diagnosing, treating, 

PT preventing or prognosing e.g. diseases or disorders of the nervous, 

PT musculoskeletal, excretory, gastrointestinal, reproductive, and 

PT respiratory systems 

XX 

PS Claim 1; SEQ ID No 754; 642pp; English. 
XX 

CC The invention relates to novel nucleic acids encoding novel human foetal 

CC antigens. The nucleic acids and proteins are used to prevent, treat (e.g. 

CC by gene therapy) or ameliorate a medical condition in e.g. humans, mice, 

CC rabbits, goats, horses, cats, dogs, chickens or sheep. They 

CC are also used in diagnosing a pathological condition or susceptibility 

CC to a pathological condition. The antibodies to the antigens can also 

CC be used in alleviating symptoms associated with the disorders and in 

CC diagnostic immunoassays e.g. radioimmunoassays or enzyme linked 

CC immunosorbent assays (ELISA) . Disorders which are diagnosed or treated 

CC include autoimmune diseases e.g. rheumatoid arthritis, 

CC hyperprolif erative disorders e.g. neoplasms of the breast or liver, 

CC cardiovascular disorders e.g. cardiac arrest, cerebrovascular disorders 

CC e.g. cerebral ischaemia, angiogenesis , nervous system disorders e.g. 

CC Alzheimer's disease, infections caused by bacteria, viruses and fungi 

CC and ocular disorders e.g. corneal infection. The polypeptides can also 

CC be used to aid wound healing and epithelial cell proliferation, to 

CC prevent skin aging due to sunburn, to maintain organs before 

CC transplantation, for supporting cell culture of primary tissues, to 

CC regenerate tissues and in chemotaxis . The polypeptides can also be used 

CC as a food additive or preservative to increase or decrease storage 

CC capabilities, fat content, lipid, protein, carbohydrate, vitamins, 

CC minerals , cof actors and other nutritional components. Numerous 

CC examples of diseases and disorders treated by the nucleic acids and 

CC proteins are given in the specification. The present sequence 

Query Match 12.2%; Score 423.4; DB 22; Length 440; 

Best Local Similarity 98.9%; Pred. No. 1.7e-74; 

Matches 435; Conservative 0; Mismatches 4; Indels 1; Gaps 1; 

Qy 1842 TTTGAAATTAACCT CATAAAAGGAATT CATATTTTAAAGGAAAAAAATACAACTAAT GAT 1901 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 TTTGAAATTAACCT CCTAAAAGGAATT CATATTTTAAAGGAAAAAAATACAACTAAT GAT 60 

Qy 1902 GCACATTTCTTAGAACACAATAGT CCATT GAT AT ACT ACT GCCTACTTTACCTAGTTCAC 1961 

I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I 

Db 61 GCAC ATT T CTT AGAAC ACAAT AGT C CATT GAT AT ACT ACT GC CT ACT TT ACCT AGTT CAC 120 



Qy 1962 CTTAACATGTAAATCCACAGGGTAGATTTCTTT CTAGAT GTGGAAGT ACAAGAAAAT CTT 2021 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 CTTAACAT GTAAATCCACAGGGTAGATTT CTTT CTAGAT GTGGAAGT ACAAGAAAAT CTT 180 



Qy 2022 TT TT AGT T ATT T GT T T GTTT ACTT C GT C C CAT GT G CTAACT AT CTT AT AT ATAAT GAGAG 2081 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 181 TT TT AGTTATT T GT TT GTT T ACTT C GT CC CAT GT GCTAACTATCT TAT AT ATAAT GAGAG 240 

Qy 2082 CCAGCTACGTAAAAGTAGCTGAGAGGCCTTGGGAGTCATTTATCCCAAACTGGG-TTTTT 2140 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 241 CCAGCTACGTAAAAGTAGCTGAGAGGCCTTGGGAGTCATTTATCCCAAACTGGGTTTTTT 300 

Qy 2141 T CT CT CATC CT T CT AC CT CC CT CCTT T GAAT GAG G GT AT G GT AGAAAAAGAT CT GGCCC A 2200 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 301 T CT CT CATCCT T CT AC CTNN CT CCT T T GAAT GAGGGT AT GGT AGAAAAAGAT CT GG CCC A 360 

Qy 2201 ATGGCATAAGTTTGGAATTTTTAATTTTGGTTTTTCCTTTTGTTTATGGGGTTGGGGGGA 2260 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 361 ATGGCATAAGTTTGGAATTTTTAATTTTGGTTTTTCCTTTTGTNTATGGGGTTGGGGGGA 420 

Qy 2261 AT GGC AGAT T TAT AT GACT T 2280 

I I I I I I I I I I I I I I I I I I I I 

Db 421 AT GGC AGAT T TAT AT GACTT 440 

RESULT 6 
AAD49513 

ID AAD49513 standard; cDNA; 2398 BP. 
XX 

AC AAD49513; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Human TRICH-15 cDNA. 
XX 

KW Human; transporter and ion channel; TRICH; atherosclerosis; cancer; 

KW gene therapy; gene; ss. 

XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualifiers 

FT CDS 114.. 1535 

FT /*tag= a 

FT /product= "Human TRICH protein" 

FT sig_peptide 114.. 230 

FT /*tag= b 

FT mat_peptide 231.. 1532 

FT /*tag= c 

FT /product= "Mature human TRICH protein" 
XX 

PN WO200283712-A2. 
XX 

PD 24-OCT-2002. 
XX 

PF 12-APR-2002; 2002WO-US11760 . 
XX 

PR 12-APR-2001; 2001US-283440P . 

PR 20-APR-2001; 2001US-285592P . 

PR 27-APR-2001; 2001US-287263P . 

PR 04-MAY-2001; 2001US-288666P . 

PR 18-MAY-2001; 2001US-292042P . 
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SQ 



25-MAY-2001; 2001US-293724P . 
22-JAN-2002; 2002US-351107P . 

(INCY-) INCYTE GENOMICS INC. 

Baughn MR, Elliott VS, Hafalia AJA, Yang J, Walia NK, Ramkumar J; 
Forsythe IJ, Lu Y, Tang YT, Yue H, Raumann BE, Lai PG, Azimzai Y; 
Lu DAM, Gandhi AR, Thornton M, Nguyen DB, Arvizu CS, Emerling BM; 
Swarnakar A, Yao MG, Ding L, He A, Griffin JA, Sanjanwala MM; 
Gietzen KJ, Lee EA, Xu Y, Au-Young JK, Das D, Lee SY, Chang H; 

WPI; 2003-092996/08. 
P-PSDB; AAE32081. 

New human functional transporters and ion channels (TRICH) 
polypeptides, useful for preparing a composition for diagnosing or 
treating a disease associated with decreased expression or 
overexpression of TRICH e.g. cancer 

Claim 5; Page 200-201; 204pp; English. 

The invention relates to human transporters and ion channels (TRICH) 
polypeptides and nucleic acid molecules encoding such polypeptides. 
TRICH proteins are useful for preparing compositions for diagnosing or 
treating diseases or conditions associated with decreased expression 
or overexpression of functional TRICH e.g. atherosclerosis or cancer. 
The invention is useful in gene therapy. The present sequence is 
human TRICH cDNA. 

v 

Sequence 2398 BP; 644 A; 588 C; 604 G; 562 T; 0 other; 



Query Match 9.2%; Score 319.4; DB 25; 

Best Local Similarity 58.0%; Pred. No. 9.9e-54; 
Matches 7 69; Conservative 0; Mismatches 466; 



Length 2398; 
Indels 90; Gaps 



8; 



Qy 

Db 



560 GGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTT 619 

II I II I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

216 GGGCCCTGCGCACCCTCGCCCTTCCCTGAAGTAGTGGAGCTGAACGTAGGCGGCCAGGTT 275 



Qy 

Db 



620 TATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCCTCCTGTGGAAAATGTTT 679 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 

276 TATGTGACCAAGCACTCGACGCTGCTCAGCGTCCCGGACAGTACTTTGGCCAGCATGTTC 335 



Qy 

Db 

Qy 

Db 

Qy 

Db 



680 



336 



725 



396 



785 



456 



T CCCCAAAGAGAGACACGGCTAAT GAT CTAGCCAAGGACT CCAAG 

I I I I II I I I I I I I I I I I I I I I 

TCGCCCTCTAGTCCCCGTGGCGGCGCCCGGCGCCGGGGCGAGCTGCCCAGGGACAGCCGG 

GGAAGGTTT TT CATTGACAGAGAT GGATT CTT GT T CC GT TAT ATT CT GGACT AT CT C AGG 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
GCGCGCTTCTTCATCGACCGGGACGGCTTCCTTTTCAGGTACGTGCTGGATTATCTGCGG 



724 



395 



784 



455 



844 



GACAGGCAGGTGGTCCTGCCTGATCACTTTCCAGAAAAAGGAAGACTGAAAAGGGAAGCT 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

GACAAGCAACTCGCGCTGCCGGAGCACTTCCCCGAGAAGGAGCGGCTGCTGCGCGAGGCC 515 



Qy 



845 GAAT ACTT CC AGCT CC C AGACT TGGT CAAACT C CT GAC CC C C GAT GAAAT CAAGCAAAGC 904 
II II I I I I I I I I I I I I I I I I I I I I I II III I III I I I I I I I I I I I 



Db 516 GAGTATTTCCAGCTCACCGACTTGGTCAAGCTGCTGTCGCCCAAGGTCACCAAGCAGAAC 575 

Qy 905 CC AGAT GAATT CT GC CACAGT GACT TT GA AGATGC 939 

I I I I I I I I II I I I I I I I I I I I I I 

Db 576 TCTCTCAACGACGAGGGCTGCCAGAGCGACCTGGAGGACAACGTCTCGCAGGGTAGCAGC 635 

Qy 940 CTCCCAAGGAAGCGACACAAGAATCTGCCCCCCTTCCTCCCTGCTCC 986 

II I I I I I I I I I I I I I I I I I I 

Db 636 GACGCGCTGCTGCTGCGCGGGGCGGCGGCCGCCGTGCCCTCGGGCCCGGGAGCGCACGGT 695 

Qy 987 CTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTTACAGA 1027 

I III I I I I I II I I I I I II II I I 

Db 696 GGTGGCGGCGGCGGCGGCGCGCAGGACAAGCGCTCGGGCTTCCTCACGCTGGGCTACCGG 755 

Qy 1028 GGAT C CT GCAC CTT GGGC AGAGAGGGACAGGC AGAT GC CAAGTT T C GGAGAGTT CC C C GG 1087 

I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I 

Db 756 GGCTCCTACACCACCGTGCGCGACAACCAGGCCGACGCCAAATTCCGGCGTGTGGCGCGC 815 

Qy 1088 AT TT T G GTT T GT GGAAGGATT T CCT T GGCAAAAGAAGT CTT T G GAGAAACTT T GAAT GAA 1147 

II I I I I I I I I III I I I II I I I I I I I I I II I I II I I I I I 

Db 816 ATCATGGTGTGCGGGCGCATCGCGCTGGCCAAGGAGGTCTTCGGGGACACGCTCAACGAG 875 

Qy 114 8 AGCAGAGACCCT GAT CGAGCCCCAGAAAGATACACCTCCAGATTTT AT CTCAAATTCAAG 1207 

II I I I I II I II II II II I I I I I I I I I I I I I I I I I I I I I I I 

Db 876 AGCC GC GAC CCC GAC C GGC AGC CGGAGAAGT AC AC GT C C C GCT T CTACCT CAAGTT C AC C 935 

Qy 1208 CACCTGGAAAGGGCTTTTGATATGTTGTCAGAGTGTGGATTCCACATGGTGGCCTGTAAC 1267 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I II 

Db 936 TACTTGGAGCAGGCCTTTGATCGCCTGTCCGAGGCCGGCTTCCACATGGTGGCGTGTAAC 995 

Qy 1268 T CAT CGGTGACAGCAT CTTTCATCAACCAATATACAGATGACAAGAT CT GGT CAAGCTAC 1327 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I 

Db 996 T C CT C GGGCAC CGC CGC CT T C GTCAAC C AGT AC CG C GAC GACAAGAT CT GGAGCAGCT AC 1055 

Qy 1328 ACT GAAT AT GT CT T CT ACC GT GAG C CT T CC AGAT GGT CAC CCT C AC ACT GC GATT GC 1384 

I I I I I I I I I I I I I I III II II I I I III II 

Db 1056 AC CGAGT AC AT T TT CT T CC GACCAC CT C AGAAAAT AGT AT CACCTAAACAAGAAC AT GAA 1115 

Qy 1385 TGCTGCAAGAATGGCAAAG GT GACAAAGAAGGGGAGAGC GGCAC GT CT T GC AAT GAC 1441 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1116 GAT AGGAAACAT GACAAAGT CACT GAT AAAGGAAGT GAAAGT GGGACTT CCT GTAAT GAG 1175 

Qy 1442 CT CT C CAC AT CT AGCT GCGACAGCCAGT CT GAG GC CAGCT CT CC CCAGGAGACGGT CAT C 1501 

I I II I I I I II II II I I I I I I I I II I I I I I III II I I I I I I I I I I 

Db 1176 CT CT C CACT T C C AGTT GTGACAGCCATTC AGAGGCAAGC ACT CC C CAGGACAAC C CATCC 1235 

Qy 1502 TGTGGTCCCGTGACA CGC CAGAC CAAC AT C C AGACT CT GGAC C GT C C CAT C AAG 1555 

III I III I I I I I I I I I II I I I I I I I I I II 

Db 1236 AGTGCCCAGCAGGCAACAGCTCACCAACCTAACACTTTAACATTGGATCGCCCCTCTAAA 1295 

Qy 1556 AAGGGCCCTGTCCAGCTGATCCAACAGTCAGAGATGCGGCGGAAAAGCGACTTACTCCGG 1615 

II I I I I II I I I I I I I I I I I I II I I I I I I I I III 

Db 1296 AAAGC AC CT GT AC AAT GGAT AC C C CCAC CAGACAAACGCAGAAACAGTGAACT CTT T CAG 1355 

Qy 1616 AT T CT GACT T CAGGCT C C AG GGAAT CGAACAT G AGCAGCAAAAAAAAAGCT GT TAAAGAA 1675 

I III I II I II I I I I I II II I I I I I II II 

Db 1356 AC CCT CAT CAG CAAGT C CCGGGAAACAAAT C T GT CCAAAAAGAAA GTCTGTGAG 14 09 



Qy 1676 AAGCT CT CAATTGAGGAGGAGCTGGAGAAAT GTATCCAGGATTT C CTAAAAAAAAAAATT 1735 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1410 AAGCT AAGT GT GGAAGAAGAAAT GAAAAAGT GT ATT CAGGATTT TAAAAAAAT CC ACAT T 1469 

Qy 1736 CCAGAT CGGTTTCCT GAGAGAAAACATCCTT GGCAAT CT GAACTTTTAAGGAAGT AT CAT 1795 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1470 C C AGAT TAT TTT CCAGAGC GCAAAC GC CAAT GGCAAT CT GAACT GTT G CAGAAGT AT GGG 1529 

Qy 1796 CTATA 1800 

I I I I 

Db 1530 TTATA 1534 



RESULT 7 
ABV99059 



ID 


ABV99059 standard; cDNA; 632 BP. 


XX 






AC 


ABV99059; 




XX 






DT 


14-JAN-2003 


(first entry) 


XX 






DE 


Human pancreatic cancer expressed cDNA SEQ ID NO 4467. 


XX 






KW 


Human; pancreas; cancer; gene therapy; vaccine; immune stimulant; 


KW 


cytostatic; 


tumour; gene; ss. 


XX 






OS 


Homo sapiens 




XX 






PN 


WO200260317- 


A2. 


XX 






PD 


08-AUG-2002. 




XX 






PF 


30-JAN-2002; 


2002WO-US02781. 


XX 






PR 


30-JAN-2001; 


2001US-265305P. 


PR 


31-JAN-2001; 


2001US-265682P. 


PR 


09-FEB-2001; 


2001US-267568P. 


PR 


21-MAR-2001; 


2001US-278651P. 


PR 


28-APR-2001; 


2001US-287112P. 


PR 


16-MAY-2001; 


2001US-291631P. 


PR 


12-JUL-2001; 


2001US-305484P. 


PR 


20-AUG-2001; 


2001US-313999P. 


PR 


27-NOV-2001; 


2001US-333626P. 


XX 






PA 


(CORI-) CORIXA CORP. 


XX 






PI 


Benson DR, 


Kalos MD, Lodes MJ, Persing DH, Hepler WT, Jiang Y; 


XX 






DR 


WPI; 2002-627435/67. 


XX 






PT 


New isolated polynucleotide and pancreatic tumor polypeptides, useful 


PT 


for diagnosing, preventing and/or treating cancer, particularly 


PT 


pancreatic cancer 


XX 






PS 


Claim 1; SEQ 


ID NO 4467; 300pp + Sequence Listing; English. 


XX 







CC The invention relates to an isolated polynucleotide (I) comprising: (a) 

CC any of a group of over 4000 nucleotide sequences (ABV94628-ABV99145) ; 

CC (b) complements of (a) ; (c) sequences consisting of at least 20 

CC contiguous residues of (a); (d) sequences that hybridize to (a), under 

CC moderately stringent conditions; (e) sequences having at least 75% or 90% 

CC identity to (a); or (f) degenerate variants of (a). Polypeptides 

CC (ABP68596-ABP68637) encoded by (I) and oligonucleotide can be used to 

CC detect cancer in a patient and compositions comprising polypeptides , 

CC polynucleotides , antibodies, fusion proteins, T cell populations and 

CC antigen presenting cells expressing the polypeptide are useful in 

CC treating pancreatic cancer and stimulating an immune response. The 

CC polynucleotides can be used as probes or primers for nucleic acid 

CC hybridisation, in the design and preparation of ribozyme molecules for 

CC inhibiting expression of the tumour polypeptides and proteins in the 

CC tumour cells, in vaccines and for gene therapy. 

CC Note: The sequence data for this patent did not form part of the printed 

CC specification, but was obtained in electronic format directly from WIPO 

CC at ftp.wipo.int/pub/published_pct_sequences. 
XX 

SQ Sequence 632 BP; 178 A; 92 C; 119 G; 239 T; 4 other; 



Query Match 5.9%; Score 205; DB 24; Length 632; 

Best Local Similarity 93.4%; Pred. No. 3.4e~31; 

Matches 214; Conservative 0; Mismatches 15; Indels 0; Gaps 0; 

Qy 3206 AGGT AC CAAT AGCT CT TT C AT AGACT T GT GCT ACAAGAAGGTTAAAAGAC CAGTT T TAT T 3265 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 1 I I I I I I I I I I I I I I I I I 
Db 15 AGGT AC CAAT AGCT CTTT CAT AGACT T GT GCT AC AAGAAGGTT AAAAGACC AGT T T TAT T 74 

Qy 3266 TT C AGC ATT C CT C AT GCATT T C AGT GGT AACCAAAAAAT AAT TT GT CAAT TAAT AGTT GT 3325 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 75 TTCAGCATT CCTCATGCATTTCAGT GGTAACCAAAAAATAATTTGT CAATT AAT AGTT GT 134 

Qy 3326 GTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTATGTGTATCACA 3385 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 135 GTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTATGTGTATCACA 194 

Qy 3386 GGTAATAAAGGCAATTGGATGATTAAAAAAAAAAAAAAAAAAAAAAAAA 3434 

I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I II 

Db 195 GGTAATAAAGGCAATTGGAT GAT AT CT GTAGGAGGAAAACAATGACTAA 243 



RESULT 8 
ABV95156 

ID ABV95156 standard; cDNA; 614 BP. 
XX 

AC ABV95156; 
XX 

DT 14-JAN-2003 (first entry) 
XX 

DE Human pancreatic cancer expressed cDNA SEQ ID NO 564. 
XX 

KW Human; pancreas; cancer; gene therapy; vaccine; immuno stimulant ; 

KW cytostatic; tumour; gene; ss . 

XX 

OS Homo sapiens . 
XX 



PN WO200260317-A2. 
XX 

PD 08-AUG-2002. 
XX 

PF 30-JAN-2002; 2002WO-US02781 . 
XX 

PR 30-JAN-2001; 

PR 31-JAN-2001; 

PR 09-FEB-2001; 

PR 21-MAR-2001; 

PR 28-APR-2001; 

PR 16-MAY-2001; 

PR 12-JUL-2001; 

PR 20-AUG-2001; 

PR 27-NOV-2001; 
XX 

PA (CORI-) CORIXA CORP. 
XX 

PI Benson DR, Kalos MD, Lodes MJ, Persing DH, Hepler WT, Jiang Y; 
XX 

DR WPI; 2002-627435/67. 
XX 

PT New isolated polynucleotide and pancreatic tumor polypeptides , useful 

PT for diagnosing, preventing and/or treating cancer, particularly 

PT pancreatic cancer 
XX 

PS Claim 1; SEQ ID NO 564; 300pp + Sequence Listing; English. 
XX 

CC The invention relates to an isolated polynucleotide (I) comprising: (a) 

CC any of a group of over 4000 nucleotide sequences (ABV94628-ABV99145) ; 

CC (b) complements of (a) ; (c) sequences consisting of at least 20 

CC contiguous residues of (a) ; (d) sequences that hybridize to (a) , under 

CC moderately stringent conditions; (e) sequences having at least 75% or 90% 

CC identity to (a); or (f) degenerate variants of (a). Polypeptides 

CC (ABP68596-ABP68637) encoded by (I) and oligonucleotide can be used to 

CC detect cancer in a patient and compositions comprising polypeptides, 

CC polynucleotides, antibodies, fusion proteins, T cell populations and 

CC antigen presenting cells expressing the polypeptide are useful in 

CC treating pancreatic cancer and stimulating an immune response. The 

CC polynucleotides can be used as probes or primers for nucleic acid 

CC hybridisation, in the design and preparation of ribozyme molecules for 

CC inhibiting expression of the tumour polypeptides and proteins in the 

CC tumour cells, in vaccines and for gene therapy. 

CC Note: The sequence data for this patent did not form part of the printed 

CC specification, but was obtained in electronic format directly from WIPO 

CC at ftp . wipo . int/pub/ published_pct__sequences . 
XX 

SQ Sequence 614 BP; 177 A; 87 C; 110 G; 236 T; 4 other; 

Query Match 5.8%; Score 201; DB 24; Length 614; 

Best Local Similarity 93.3%; Pred. No. 2.1e-30; 

Matches 210; Conservative 0; Mismatches 15; Indels 0; Gaps 0; 

Qy 3210 ACCAAT AGCT CT TT C AT AGACT T GT GCT ACAAGAAGGTTAAAAGAC CAGTT TT ATT T T C A 3269 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 AC CAAT AGCT CT TT CAT AGACT T GT GCT ACAAGAAGGTTAAAAGAC C AGTT T T ATT T T C A 60 



2001US-265305P. 
2001US-265682P. 
2001US-267568P. 
2001US-278651P. 
2001US-287112P. 
2001US-291631P. 
2001US-305484P. 
2001US-313999P. 
2001US-333626P. 



Qy 3270 GCAT TCCT C AT GC ATT T C AGT GGT AAC CAAAAAAT AATTT GT CAAT TAAT AGT T GT GT GC 3329 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 61 G CAT TCCT C AT GC ATTT C AGT G GTAAC CAAAAAAT AAT T T GT CAAT TAAT AGTT GT GT GC 120 



Qy 3330 CAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTATGTGTATCACAGGTA 3389 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 121 C AAGC AC T C CT AAT T T GT T T T ATT GCGT GT GT GT GCAT GT GT GT AT GT GT AT C AC AGGT A 180 

Qy 3390 AT AAAG G CAAT T G GAT GAT T AAAAAAAAAAAAAAAAAAAAAAAAA 3434 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

Db 181 AT AAAGGCAAT T G GAT GAT AT CTGT AGGAGGAAAACAAT GACTAA 225 



RESULT 9 
ABT09812 

ID ABT09812 standard; cDNA; 2052 BP. 
XX 

AC ABT09812; 
XX 

DT 05-DEC-2002 (first entry) 
XX 

DE Polynucleotide encoding the K+beta M6 protein SEQ ID No 1. 
XX 

KW Cytostatic; cardiant; neuroprotective; immunomodulator ; antimigraine; 

KW sedative; gynaecological;; potassium channel beta subunit; K+betaM6; 

KW gastrointestinal; reproductive; neural; sleep; low DNA repair capacity; 

KW hyperpotassium channel activity; cardiovascular; melatonin synthesis; 

KW mammary cancer tumourigenesis ; pineal gland associated disorder; 

KW pulmonary disorder; immune disorder; NF-kB activity; migraine headache; 

KW low free-radical buffering capacity; delayed sleep phase syndrome; 

KW circadian cycle; melatonin secretion; cancer; gene; ss. 

XX 

OS Homo sapiens . 
XX 

PN WO200270727-A2. 
XX 

PD 12-SEP-2002. 
XX 

PF 21-FEB-2002; 2002WO-US05674 . 
XX 

PR 21-FEB-2001; 2001US-270132P . 

PR 27-MAR-2001; 2001US-278953P . 
XX 

PA (BRIM ) BRISTOL-MYERS SQUIBB CO. 
XX 

PI Feder J, Lee L, Chen J, Jackson DG, Ramanathan C, Siemers N; 

PI Chang H; 

XX 

DR WPI; 2002-713455/77. 

DR P-PSDB; ABJ10886. 
XX 

PT New polynucleotide encoding human potassium channel beta subunit 

PT polypeptide, useful for diagnosing, preventing, treating or 

PT ameliorating e.g. cancer 
XX 

PS Claim 1; Fig 1; 332pp; English. 
XX 



CC The invention relates to an isolated polynucleotide encoding a potassium 

CC channel beta subunit (K+betaM6) polypeptide or its variants. The human 

CC potassium beta subunit polynucleotide or polypeptide is useful for 

CC diagnosing, preventing, treating or ameliorating a pathological condition 

CC such as gastrointestinal, reproductive, neural, sleep, cardiovascular or 

CC pulmonary disorders, a disorder related to hyperpotassium channel 

CC activity, an immune disorder related to aberrant NF-kB activity, pineal 

CC gland associated disorders, migraine headaches, disorders associated with 

CC aberrant melatonin synthesis and/or release or with low DNA repair 

CC capacities or low free-radical buffering capacity, delayed sleep phase 

CC syndrome, aberrations in circadian cycle, mammary cancer tumourigenesis , 

CC age related disorders associated with decreased melatonin secretion, or 

CC cancer. This polynucleotide sequence represents the cDNA encoding the 

CC potassium channel beta subunit (K+betaM6) protein of the invention. 

XX 

SQ Sequence 2052 BP; 380 A; 640 C; 607 G; 425 T; 0 other; 

Query Match 4.8%; Score 167; DB 24; Length 2052; 

Best Local Similarity 64.6%; Pred. No. 1.5e-23; 

Matches 267; Conservative 0; Mismatches 140; Indels 6; Gaps 1; 

Qy 967 CCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTTACAG 1026 

I I I I I I I III II III I I I I I I I I I I I I I I I I I 

Db 705 CACGCCGTCCCAGTCGCTGGACGGCAGCCGGCGCTCGGGCTACATCACCATCGGCTACCG 764 

Qy 1027 AGGAT CCT GC AC CT T GGGC AGAGAGGGACAG GCAGAT G C CAAGTT T C G GAGAGTT CC C C G 1086 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 765 CGGCTCCTACACCATCGGGCGGGACGCGCAGGCGGACGCCAAGTTCCGGCGAGTGGCGCG 824 

Qy 1087 GATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTTGAATGA 1146 

II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 825 CATCACCGTTTGCGGAAAGACGTCGCTGGCCAAGGAGGTGTTTGGGGACACCCTGAACGA 884 

Qy 1147 AAGCAGAGACCCTGATCGAGCCCCAGAAAGATACACCTCCAGATTTTATCTCAAATTCAA 1206 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 8 85 AAGCCGGGACCCCGACCGTCCCCCGGAGCGCTACACCTCGCGCTATTACCTCAAGTTCAA 944 

Qy 1207 GCACCTGGAAAGGGCTTTTGATATGTTGTCAGAGTGTGGATTCCACATGGTGGCCTGTAA 1266 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 945 CTTCCTGGAGCAGGCCTTCGACAAGCTGTCCGAGTCGGGCTTCCACATGGTGGCGTGCAG 1004 

Qy 1267 CT CAT C GGTGAC AGCAT CT TT CAT CAAC CAAT AT AC AGAT GACAAGAT CT GGT C 1320 

III III II III I I I I I I II II I I I I I I I I I I I I I 

Db 1005 CT C CAC GGG C ACCT GC GCCT TT GCCAGC AGC AC C GAC C AGAGCGAGGACAAGATCT GGAC 1064 

Qy 1321 AAGCT AC ACT GAAT AT GTCT T CTAC CGT GAGC CT T C CAGAT GGT CAC C CT CAC 1373 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1065 C AG CT AC ACCGAGT ACGT CT T CTG CAGGGAGT GAGCT C C C C AGAC CCC CT C GC 1117 



RESULT 10 
ABQ88125/C 

ID ABQ88125 standard; cDNA; 109201 BP. 
XX 

AC ABQ88125; 
XX 

DT 18-SEP-2002 (first entry) 



XX 

DE Human osteoblast differentiation related cDNA SEQ ID NO 32. 
XX 

KW Human; osteoblast; stem cell differentiation; bone tissue deposition; 

KW osteoporosis; osteopathic; ss. 

XX 

OS Homo sapiens . 
XX 

PN WO200250301-A2. 
XX 

PD 27-JUN-2002. 
XX 

PF 18-DEC-2001; 2001WO-US48276 . 
XX 

PR 18-DEC-2000; 2000US-255882P . 

PR 24-APR-2001; 2001US-285691P . 
XX 

PA (GENE-) GENE LOGIC INC. 

PA (PROC ) PROCTER & GAMBLE CO. 

XX 

PI Ji D, Axelrod DW, Cook JS, Jaiswal N, Einstein R, Houghton A; 

PI Mertz L; 

XX 

DR WPI; 2002-557663/59. 
XX 

PT Use of genes and their expression profiles associated with osteoblast 

PT differentiation for screening modulators bone formation, for diagnosing 

PT or treating e.g. osteoporosis, or as markers for the differentiation 

PT process 
XX 

PS Claim 1; SEQ ID NO 32; 78pp + Sequence Listing; English. 
XX 

CC The invention relates to genes and their expression profiles are used 

CC for: 

CC (a) screening modulators of precursor stem cell differentiation into 

CC osteoblasts, or bone tissue deposition; 

CC (b) diagnosing abnormal deposition of bone tissue, abnormal rate of 

CC osteoblast formation or osteoporosis; or 

CC (c) treating or monitoring treatment of the conditions cited in (b) , or 

CC monitoring the progression of bone tissue deposition. 

CC Specific conditions include postmenopausal osteoporosis, glucocorticoid 

CC osteoporosis or male osteoporosis, osteopenia, osteodystrophy, 

CC drug-induced abnormalities in bone formation or bone loss, conditions 

CC that involve altered bone metabolism (e.g. idiopathic juvenile 

CC osteoporosis), skeletal disease linked to breast cancer, mastocytosis, 

CC Fanconi syndrome or fibrous dysplasia. The present sequence is that of an 

CC osteoblast differentiation associated cDNA marker of the invention. 

CC Note: The sequence data for this patent did not form part of the printed 

CC specification, but was obtained in electronic format directly from WIPO 

CC at ftp . wipo . int/pub/published_pct_sequences . 

XX 

SQ Sequence 109201 BP; 32871 A; 23488 C; 22108 G; 30734 T; 0 other; 

Query Match 4.8%; Score 167; DB 24; Length 109201; 
Best Local Similarity 64.6%; Pred. No. 3.5e-23; 

Matches 267; Conservative 0; Mismatches 140; Indels 6; Gaps 1; 



Qy 967 CCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTTACAG 1026 

I I I I I I I III II III I I I I I I I I I I I I I I I I I 

Db 9291 CACGCCGTCCCAGTCGCTGGACGGCAGCCGGCGCTCGGGCTACATCACCATCGGCTACCG 9232 

Qy 1027 AGGATC CTGCAC CTT GGGCAGAGAGGGAC AGGC AGAT GC CAAGT T T C GGAGAGT T C C C C G 1086 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I III 

Db 9231 CGGCTCCTACACCATCGGGCGGGACGCGCAGGCGGACGCCAAGTTCCGGCGAGTGGCGCG 9172 

Qy 1087 GATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTTGAATGA 114 6 

II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 9171 CAT C AC C GTT T GC GGAAAGAC GTC GCT GGC CAAGGAGGT GT TTGGGGACAC CCT GAACGA 9112 

Qy 1147 AAGCAGAGAC CCT GAT CGAGC C CC AGAAAGAT AC ACCT C CAGAT T TTAT CT CAAATT CAA 1206 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 9111 AAGCCGGGACCCCGACCGTCCCCCGGAGCGCTACACCTCGCGCTATTACCTCAAGTTCAA 9052 

Qy 1207 GCAC CT GGAAAGGGCT TT T GAT AT GTT GT C AGAGT GT GGAT T CC ACAT GGT GGC CT GTAA 1266 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 9051 CTTCCTGGAGCAGGCCTTCGACAAGCTGTCCGAGTCGGGCTTCCACATGGTGGCGTGCAG 8992 

Qy 1267 CTCATCGGTGACAGCATCTTT CAT CAACCAAT AT AC AGAT GACAAGAT CT GGT C 1320 

III III II III I I I I I I II II I I I I I I I I I I I I I 

Db 8991 CTC C AC GGGC AC CT GC GC CTT T GC C AGC AGCAC C GAC C AGAGCGAGGACAAGAT CT GGAC 8932 

Qy 1321 AAGCT AC ACT GAAT AT GT CTT CT AC CGT GAGC CTT CCAGAT GGT CAC CCT C AC 1373 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 8931 C AGCTAC AC C GAGT AC GT CTT CT GCAGGGAGT GAGCT C CCC AGAC CC C CT C GC 8879 



RESULT 11 
ABQ40654 

ID ABQ40654 standard; DNA; 854 BP. 
XX 

AC ABQ40654; 
XX 

DT 12-JUL-2002 (first entry) 
XX 

DE Oligonucleotide for detecting cytosine methylation SEQ ID NO 27245. 
XX 

KW Human; cytosine methylation; 5 f -CpG-3'; uracil; cytosine; diagnosis; 

KW drug; side effect; cancer; central nervous system; cardiovascular; 

KW gastrointestinal; respiratory system; single nucleotide polymorphism; 

KW SNP; cell differentiation; ds . 
XX 

OS Homo sapiens. 
XX 

PN WO200218632-A2. 
XX 

PD 07-MAR-2002 . 
XX 

PF 01-SEP-2001; 2001WO-EP10074 . 
XX 

PR 01-SEP-2000; 2000DE-1043826 . 

PR 05-SEP-2000; 2000DE-1044543 . 
XX 

PA (EPIG-) EPIGENOMICS AG. 
XX 



PI Olek A, Piepenbrock C, Berlin K, Guetig D; 
XX 

DR WPI; 2002-371829/40. 
XX 

PT Determining the degree of cytosine methylation in genomic DNA, useful 

PT for diagnosis and prognosis, comprises selective hybridization of 

PT amplicons from chemically treated DNA - 
XX 

PS Claim 12; 56pp + Sequence Listing; 56pp; German. 
XX 

CC This invention describes a novel method for determining the degree of 

CC methylation of a particular cytosine in a motif 5 f -CpG-3', present in a 

CC genomic sample of DNA. The sample is treated chemically to convert 

CC cytosine (C) but not methylated C, to uracil, then part of the genomic 

CC DNA that contains the target C is amplified to form a labeled amplicon. 

CC The amplicon is hybridised to two classes, each with at least one 

CC member, of oligonucleotides and/or peptide-nucleic acid (PNA) oligomers 

CC and the degree of hybridisation to both classes is determined from the 

CC label on the amplicon. From the ratio of labels hybridised to the two 

CC classes of oligomers, the degree of methylation is calculated. The method 

CC is used: (i) for diagnosis and/or prognosis of side effects of 

CC therapeutic drugs and of a wide range of diseases, e.g. cancer, disorders 

CC of the central nervous, cardiovascular, gastrointestinal and respiratory 

CC systems etc., particularly by detecting mutations or single nucleotide 

CC polymorphisms (SNP's); and (ii) for differentiation of cell or tissue 

CC types and for investigating cell differentiation. The method allows the 

CC methylation status of many C residues to be determined simultaneously. 

CC ABQ13410-ABQ54121 represent genomic DNA sequences used to illustrate the 

CC method for determining the degree of cytosine methylation described in 

CC the disclosure of the invention. 

XX 

SQ Sequence 854 BP; 131 A; 98 C; 289 G; 336 T; 0 other; 

Query Match 3.3%; Score 114.2; DB 24; Length 854; 

Best Local Similarity 59.7%; Pred. No. 3.6e-13; 

Matches 213; Conservative 0; Mismatches 138; Indels 6; Gaps 1; 

Qy 1001 TGGGGTTTCATTACTGTGGGTTACAGAGGATCCTGCACCTTGGGCAGAGAGGGACAGGCA 1060 

I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I 
Db 4 91 TCGGGTTATATTATTATCGGTTATCGCGGTTTTTATATTATCGGGCGGGACGCGTAGGCG 550 

Qy 1061 GATGCCAAGTTTCGGAGAGTTCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAA 1120 

III I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II 
Db 551 GACGTTAAGTTTCGGCGAGTGGCGCGTATTATCGTTTGCGGAAAGACGTCGTTGGTTAAG 610 

Qy 1121 GAAGT CTTTGGAGAAACTTT GAATGAAAGCAGAGACCCT GAT CGAGCCCCAGAAAGATAC 1180 

I I I I I I I I I I I I I I I I I I I I I I I III I I I I I III III 

Db 611 GAGGT GT T TGGGGAT ATT TT GAAC GAAAGT CGGGAT TT C GAT C GTTT TT C GGAGC GT TAT 670 

Qy 1181 ACCTCCAGATTTTATCTCAAATTCAAGCACCTGGAAAGGGCTTTTGATATGTTGTCAGAG 124 0 

I II I I I I I I I I I I I I I I I II I I I I I I I II I I I I I III 

Db 671 ATTTCGCGTTATTATTTTAAGTTTAATTTTTTGGAGTAGGTTTTCGATAAGTTGTTCGAG 730 

Qy 1241 T GT GGAT T CC ACAT GGT GGCCT GT AACT CAT C GGT GACAGC AT CTT T CATCAAC 1294 

I I I I I I I I I I I I I I I I I I I III I II I I II 

Db 731 TCGGGTTTTTATATGGTGGCGTGTAGTTTTACGGGTATTTGCGTTTTTGTTAGTAGTATC 790 



Qy 1295 CAAT ATACAGATGACAAGATCT GGT CAAGCTACACT GAAT ATGT CTT CT ACCGTGAG 1351 

I I I I I I I I I I I I I I I I I I I I I II II II II I I I I I 
Db 791 GATTAGAGCGAGGATAAGATTT GGATTAGTTAT ATC GAGT ACGTTTTTT GTAGGGAG 847 



RESULT 12 
ABQ40655/C 

ID ABQ40655 standard; DNA; 854 BP. 
XX 

AC ABQ40655; 
XX 

DT 12-JUL-2002 (first entry) 
XX 

DE Oligonucleotide for detecting cytosine methylation SEQ ID NO 27246. 
XX 

KW Human; cytosine methylation; S'-CpG-S 1 ; uracil; cytosine; diagnosis; 

KW drug; side effect; cancer; central nervous system; cardiovascular; 

KW gastrointestinal; respiratory system; single nucleotide polymorphism; 

KW SNP; cell differentiation; ds . 
XX 

OS Homo sapiens. 
XX 

PN WO200218632-A2. 
XX 

PD 07-MAR-2002. 
XX 

PF 01-SEP-2001; 2001WO-EP10074 . 
XX 

PR 01-SEP-2000; 2000DE-1043826 . 

PR 05-SEP-2000; 2000DE-1044543 . 
XX 

PA (EPIG-) EPIGENOMICS AG. 
XX 

PI Olek A, Piepenbrock C, Berlin K, Guetig D; 
XX 

DR WPI; 2002-371829/40. 
XX 

PT Determining the degree of cytosine methylation in genomic DNA, useful 

PT for diagnosis and prognosis, comprises selective hybridization of 

PT amplicons from chemically treated DNA - 
XX 

PS Claim 12; 56pp + Sequence Listing; 56pp; German. 
XX 

CC This invention describes a novel method for determining the degree of 

CC methylation of a particular cytosine in a motif S'-CpG-S 1 , present in a 

CC genomic sample of DNA. The sample is treated chemically to convert 

CC cytosine (C) but not methylated C, to uracil, then part of the genomic 

CC DNA that contains the target C is amplified to form a labeled amplicon. 

CC The amplicon is hybridised to two classes, each with at least one 

CC member, of oligonucleotides and/or peptide-nucleic acid (PNA) oligomers 

CC and the degree of hybridisation to both classes is determined from the 

CC label on the amplicon. From the ratio of labels hybridised to the two 

CC classes of oligomers, the degree of methylation is calculated. The method 

CC is used: (i) for diagnosis and/or prognosis of side effects of 

CC therapeutic drugs and of a wide range of diseases, e.g. cancer, disorders 

CC of the central nervous, cardiovascular, gastrointestinal and respiratory 

CC systems etc., particularly by detecting mutations or single nucleotide 



CC polymorphisms (SNP's); and (ii) for differentiation of cell or tissue 

CC types and for investigating cell differentiation. The method allows the 

CC methylation status of many C residues to be determined simultaneously. 

CC ABQ13410-ABQ54121 represent genomic DNA sequences used to illustrate the 

CC method for determining the degree of cytosine methylation described in 

CC the disclosure of the invention. 
XX 

SQ Sequence 854 BP; 336 A; 289 C; 98 G; 131 T; 0 other; 

Query Match 3.3%; Score 114.2; DB 24; Length 854; 

Best Local Similarity 59.7%; Pred. No. 3.6e-13; 

Matches 213; Conservative 0; Mismatches 138; Indels 6; Gaps 1; 

Qy 1001 T G GGGTT T C ATT ACT GT GGGT T AC AGAGGAT C CT GC AC CTT GGGC AGAGAGGGAC AGGCA 1060 

I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I 
Db 364 TCGGGTTATATTATTATCGGTTATCGCGGTTTTTATATTATCGGGCGGGACGCGTAGGCG 305 

Qy 1061 GATGCCAAGTTTCGGAGAGTTCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAA 1120 

III I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 304 GACGTTAAGTTTCGGCGAGTGGCGCGTATTATCGTTTGCGGAAAGACGTCGTTGGTTAAG 245 

Qy 1121 G AAG T C T T T G G AG AAAC T T T G AAT G AAAG C AG AG AC C C T GAT C GAG C C C C AG AAAG AT AC 1180 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I III III 

Db 244 GAGGTGTTTGGGGATATTTTGAACGAAAGTCGGGATTTCGATCGTTTTTCGGAGCGTTAT 185 

Qy 1181 ACCT CCAGATTTTAT CTCAAATTCAAGCACCT GGAAAGGGCTTTT GATATGTT GTCAGAG 124 0 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I III 

Db 184 ATT T CGCGTT AT TAT T TTAAGT T TAATT TT T T GGAGT AGGT TTT C GATAAGTT GTT C GAG 125 

Qy 1241 TGT GGATTCCACATGGTGGCCT GTAACTCAT CGGT GACAGCATCTTT CATCAAC 1294 

I I I I I I I I I I I I II I I I I I III I III I II 

Db 124 TCGGGTTTTTATATGGTGGCGTGTAGTTTTACGGGTATTTGCGTTTTTGTTAGTAGTATC 65 

Qy 1295 CAAT AT ACAGAT GACAAGAT CT GGT CAAGCT ACACT GAAT AT GT CTT CT AC CGT GAG 1351 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 64 GAT T AGAGC GAGGAT AAGAT TT GGAT TAGT TAT AT C GAGT AC GT TT T TT GT AGGGAG 8 



RESULT 13 
ABQ13668 

ID ABQ13668 standard; DNA; 1757 BP. 
XX 

AC ABQ13668; 
XX 

DT 12-JUL-2002 (first entry) 
XX 

DE Oligonucleotide for detecting cytosine methylation SEQ ID NO 259. 
XX 

KW Human; cytosine methylation; 5'-CpG-3*; uracil; cytosine; diagnosis; 

KW drug; side effect; cancer; central nervous system; cardiovascular; 

KW gastrointestinal; respiratory system; single nucleotide polymorphism; 

KW SNP; cell differentiation; ds . 
XX 

OS Homo sapiens . 
XX 

PN WO200218632-A2. 
XX 



PD 07-MAR-2002. 
XX 

PF 01-SEP-2001; 2001WO-EP10074 . 
XX 

PR 01-SEP-2000; 2000DE-1043826 . 

PR 05-SEP-2000; 2000DE-1044543 . 
XX 

PA (EPIG-) EPIGENOMICS AG. 
XX 

PI Olek A, Piepenbrock C, Berlin K, Guetig D; 
XX 

DR WPI; 2002-371829/40. 
XX 

PT Determining the degree of cytosine methylation in genomic DNA, useful 

PT for diagnosis and prognosis, comprises selective hybridization of 

PT amplicons from chemically treated DNA - 
XX 

PS Claim 12; 56pp + Sequence Listing; 56pp; German. 
XX 

CC This invention describes a novel method for determining the degree of 

CC methylation of a particular cytosine in a motif 5 f -CpG-3 f , present in a 

CC genomic sample of DNA. The sample is treated chemically to convert 

CC cytosine (C) but not methylated C, to uracil, then part of the genomic 

CC DNA that contains the target C is amplified to form a labeled amplicon. 

CC The amplicon is hybridised to two classes, each with at least one 

CC member, of oligonucleotides and/or peptide-nucleic acid (PNA) oligomers 

CC and the degree of hybridisation to both classes is determined from the 

CC label on the amplicon. From the ratio of labels hybridised to the two 

CC classes of oligomers, the degree of methylation is calculated. The method 

CC is used: (i) for diagnosis and/or prognosis of side effects of 

CC therapeutic drugs and of a wide range of diseases, e.g. cancer, disorders 

CC of the central nervous, cardiovascular, gastrointestinal and^respiratory 

CC systems etc., particularly by detecting mutations or single nucleotide 

CC polymorphisms (SNP f s); and (ii) for differentiation of cell or tissue 

CC types and for investigating cell differentiation. The method allows the 

CC methylation status of many C residues to be determined simultaneously. 

CC ABQ13410-ABQ54121 represent genomic DNA sequences used to illustrate the 

CC method for determining the degree of cytosine methylation described in 

CC the disclosure of the invention. 

XX 

SQ Sequence 1757 BP; 246 A; 209 C; 640 G; 662 T; 0 other; 

Query Match 3.2%; Score 109.8; DB 24; Length 1757; 
Best Local Similarity 62.6%; Pred. No. 3.1e-12; 

Matches 171; Conservative 0; Mismatches 102; Indels 0; Gaps 0; 

Qy 1001 TGGGGTTTCATTACTGTGGGTTACAGAGGATCCTGCACCTTGGGCAGAGAGGGACAGGCA 1060 

I I I I I I I I II I I I I I I I I I I I I I III I II I I I I I 

Db 1485 TCGGGTTATATTATTATCGGTTATCGCGGTTTTTATATTATCGGGCGGGACGCGTAGGCG 1544 

Qy 1061 GATGCCAAGTTTCGGAGAGTTCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAA 1120 

III I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II II II II 

Db 1545 GACGTTAAGTTTCGGCGAGTGGCGCGTATTATCGTTTGCGGAAAGACGTCGTTGGTTAAG 1604 

Qy 1121 GAAGT CTTTGGAGAAACTTT GAATGAAAGCAGAGACCCTGAT CGAGCCCCAGAAAGAT AC 1180 

I I I I I I I I I I I I I I II I I I I I I I III I I I I I III III 

Db 1605 GAGGTGTTTGGGGATATTTTGAACGAAAGTCGGGATTTCGATCGTTTTTCGGAGCGTTAT 1664 



Qy 1181 ACCT CC AGAT T TT AT CT CAAAT T CAAGC ACCT GGAAAGGGCT TTT GAT AT GT T GT CAGAG 1240 

I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I III 

Db 1665 ATTTCGCGTTATTATTTTAAGTTTAATTTTTTGGAGTAGGTTTTCGATAAGTTGTTCGAG 1724 



Qy 1241 TGTGGATTCCACATGGTGGCCTGTAACTCATCG 1273 

I I I I I I I I I I I I I I I II I I II 
Db 1725 TCGGGTTTTTATATGGTGGCGTGTAGTTTTACG 1757 



RESULT 14 
ABQ13669/C 

ID ABQ13669 standard; DNA; 1757 BP. 
XX 

AC ABQ13669; 
XX 

DT 12-JUL-2002 (first entry) 
XX 

DE Oligonucleotide for detecting cytosine methylation SEQ ID NO 260. 
XX 

KW Human; cytosine methylation; 5 f -CpG-3 f ; uracil; cytosine; diagnosis; 

KW drug; side effect; cancer; central nervous system; cardiovascular; 

KW gastrointestinal; respiratory system; single nucleotide polymorphism; 

KW SNP; cell differentiation; ds . 
XX 

OS Homo sapiens. 
XX 

PN WO200218632-A2. 
XX 

PD 07-MAR-2002. 
XX 

PF 01-SEP-2001; 2001WO-EP10074 . 
XX 

PR 01-SEP-2000; 2000DE-1043826 . 

PR 05-SEP-2000; 2000DE-1044543 . 
XX 

PA (EPIG-) EPIGENOMICS AG. 
XX 

PI Olek A, Piepenbrock C, Berlin K, Guetig D; 
XX 

DR WPI; 2002-371829/40. 
XX 

PT Determining the degree of cytosine methylation in genomic DNA, useful 

PT for diagnosis and prognosis, comprises selective hybridization of 

PT amplicons from chemically treated DNA - 
XX 

PS Claim 12; 56pp + Sequence Listing; 56pp; German. 
XX 

CC This invention describes a novel method for determining the degree of 

CC methylation of a particular cytosine in a motif 5'-CpG-3 f , present in a 

CC genomic sample of DNA. The sample is treated chemically to convert 

CC cytosine (C) but not methylated C, to uracil, then part of the genomic 

CC DNA that contains the target C is amplified to form a labeled amplicon. 

CC The amplicon is hybridised to two classes, each with at least one 

CC member, of oligonucleotides and/or peptide-nucleic acid (PNA) oligomers 

CC and the degree of hybridisation to both classes is determined from the 

CC label on the amplicon. From the ratio of labels hybridised to the two 



CC classes of oligomers, the degree of methylation is calculated. The method 

CC is used: (i) for diagnosis and/or prognosis of side effects of 

CC therapeutic drugs and of a wide range of diseases, e.g. cancer, disorders 

CC of the central nervous, cardiovascular, gastrointestinal and respiratory 

CC systems etc., particularly by detecting mutations or single nucleotide 

CC polymorphisms (SNP's); and (ii) for differentiation of cell or tissue 

CC types and for investigating cell differentiation. The method allows the 

CC methylation status of many C residues to be determined simultaneously. 

CC ABQ13410-ABQ54121 represent genomic DNA sequences used to illustrate the 

CC method for determining the degree of cytosine methylation described in 

CC the disclosure of the invention. 
XX 

SQ Sequence 1757 BP; 662 A; 640 C; 209 G; 246 T; 0 other; 



Query Match 3.2%; Score 109.8; DB 24; Length 1757; 

Best Local Similarity 62.6%; Pred. No. 3.1e-12; 

Matches 171; Conservative 0; Mismatches 102; Indels 0; Gaps 0; 

Qy 1001 T GGGGT TT C AT T ACTGT GG GT T AC AGAGGAT C CT GCAC CTT GGGC AGAGAGGGAC AGGCA 1060 

I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I 
Db 273 TCGGGTTATATTATTATCGGTTATCGCGGTTTTTATATTATCGGGCGGGACGCGTAGGCG 214 

Qy 1061 GAT GCCAAGTTT C GGAGAGT T C C C C GGAT TTT GGTTT GT G GAAGGAT T T C CTT GGCAAAA 1120 

III I I I I I I II I I I I I I I I II I I I I I I I I I I I I I II I I I I II 
Db 213 GAC GT TAAGTTT C GGC GAGT GGCGC GT AT TAT C GTTT GC GGAAAGAC GT C GTT GGT TAAG 154 

Qy 1121 GAAGT CTT T GGAGAAACTT T GAAT GAAAGCAGAGACCCT GAT C GAGC C CCAGAAAGAT AC 118 0 

I I I I I I I I I I I I I I I I I I I I I I I III I I I I I III III 

Db 153 GAGGTGTTTGGGGATATTTTGAACGAAAGTCGGGATTTCGATCGTTTTTCGGAGCGTTAT 94 

Qy 1181 ACCT CCAGATT T T AT CT CAAATT CAAGC AC CT GGAAAGGGCT TTT GAT AT GT T GT C AGAG 1240 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 93 ATTTCGCGTTATTATTTTAAGTTTAATTTTTTGGAGTAGGTTTTCGATAAGTTGTTCGAG 34 

Qy 1241 T GT GGATT C CAC AT GGT GG CCT GTAACT CAT C G 1273 

I I I I I I I I I I I I I I I I I I I II 
Db 33 TCGGGTTTTTATATGGTGGCGTGTAGTTTTACG 1 



RESULT 15 
ABQ40656/c 

ID ABQ40656 standard; DNA; 854 BP. 
XX 

AC ABQ40656; 
XX 

DT 12-JUL-2002 (first entry) 
XX 

DE Oligonucleotide for detecting cytosine methylation SEQ ID NO 27247. 
XX 

KW Human; cytosine methylation; 5'-CpG-3'; uracil; cytosine; diagnosis; 

KW drug; side effect; cancer; central nervous system; cardiovascular; 

KW gastrointestinal; respiratory system; single nucleotide polymorphism; 

KW SNP; cell differentiation; ds . 
XX 

OS Homo sapiens . 
XX 

PN WO200218632-A2. 



XX 

PD 07-MAR-2002 . 
XX 

PF 01-SEP-2001; 2001WO-EP10074 . 
XX 

PR 01-SEP-2000; 2000DE-1043826 . 

PR 05-SEP-2000; 2000DE-1044543 . 
XX 

PA (EPIG-) EPIGENOMICS AG. 
XX 

PI Olek A, Piepenbrock C, Berlin K, Guetig D; 
XX 

DR WPI; 2002-371829/40. 
XX 

PT Determining the degree of cytosine methylation in genomic DNA, useful 

PT for diagnosis and prognosis,, comprises selective hybridization of 

PT amplicons from chemically treated DNA - 
XX 

PS Claim 12; 56pp + Sequence Listing; 56pp; German. 
XX 

CC This invention describes a novel method for determining the degree of 



CC methylation of a particular cytosine in a motif 5 f -CpG-3', present in a 

CC genomic sample of DNA. The sample is treated chemically to convert 

CC cytosine (C) but not methylated C, to uracil, then part of the genomic 

CC DNA that contains the target C is amplified to form a labeled amplicon. 

CC The amplicon is hybridised to two classes, each with at least one 

CC member, of oligonucleotides and/or peptide-nucleic acid (PNA) oligomers 

CC and the degree of hybridisation to both classes is determined from the 

CC label on the amplicon. From the ratio of labels hybridised to the two 

CC classes of oligomers, the degree of methylation is calculated. The method 

CC is used: (i) for diagnosis and/or prognosis of side effects of 

CC therapeutic drugs and of a wide range of diseases, e.g. cancer, disorders 

CC of the central nervous, cardiovascular, gastrointestinal and respiratory 

CC systems etc., particularly by detecting mutations or single nucleotide 

CC polymorphisms (SNP ! s); and (ii) for differentiation of cell or tissue 

CC types and for investigating cell differentiation. The method allows the 



CC methylation status of many C residues to be determined simultaneously. 

CC ABQ13410-ABQ54121 represent genomic DNA sequences used to illustrate the 

CC method for determining the degree of cytosine methylation described in 

CC the disclosure of the invention. 
XX 

SQ Sequence 854 BP; 132 A; 98 C; 302 G; 322 T; 0 other; 

Query Match 3.1%; Score 108.6; DB 24; Length 854; 

Best Local Similarity 57.1%; Pred. No. 4.6e-12; 

Matches 220; Conservative 0; Mismatches 159; Indels 6; Gaps 1; 

Qy 967 CCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTTACAG 1026 

I I I I I I I III I I I I I I II I I I I I I II I I I I 

Db 398 C ACGCC GT CC CAATC GCT AAACGACAAC C GACGCT C GAACTAC AT C ACC AT CGACT AC C G 339 

Qy 1027 AGGATCCTGCACCTTGGGCAGAGAGGGACAGGCAGATGCCAAGTTTCGGAGAGTTCCCCG 1086 

I I I I I I I I I I I I I I I II I MINIMI! Ml I II 

Db 338 C GACTCCT ACACC AT C GAAC GAAAC GC GCAAAC GAAC GCCAAATT CCGAC GAATAAC GC G 279 

Qy 1087 GATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTTGAATGA 1146 

II I I I I III I II I I I I I I I I I I I I I I I I I I I I 



Db 



278 CAT CAC C GTTT AC GAAAAAAC GTC G CT AAC CAAAAAAAT ATT T AAAAAC AC CCT AAACGA 219 



Qy 1147 AAGC AGAGAC CCT GAT CGAGC CCCAGAAAGAT ACAC CT C CAGAT T T T AT CT CAAATT CAA 1206 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 218 AAACCGAAACCCCGACCGTCCCCCGAAACGCTACACCTCGCGCTATTACCTCAAATTCAA 159 

Qy 1207 GCACCTGGAAAGGGCTTTTGATATGTTGTCAGAGTGTGGATTCCACATGGTGGCCTGTAA 1266 

III II I I I I I I I I I I I I I I I I I I I I I II II 

Db 158 CT T C CTAAAACAAAC CTT C GACAAACT AT C C GAAT C GAACT T CC ACATAATAAC GT ACAA 99 

Qy 12 67 CT CAT C GGT GACAGC AT CT TT CAT CAAC CAAT ATAC AGAT GACAAGAT CT GGT C 1320 

III II II III I I I I I I II II I I I I I I I I I 

Db 98 CT CCACGAACACCTACGCCTTTACCAACAACACCGACCAAAACGAAAACAAAAT CTAAAC 39 

Qy 1321 AAGCTACACT GAAT AT GT CTTCTAC 1345 

I I I I I I I I I I II I I I I I I I I I 
Db 38 CAACT AC AC CGAAT AC GT CTT CT AC 14 
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Gapop 10.0 , Gapext 1.0 



Searched: 569978 seqs, 220691566 residues 

Total number of hits satisfying chosen parameters: 



1139956 



Minimum DB seq length: 
Maximum DB seq length: 



Post-processing: 



2000000000 



Minimum Match 0% 
Maximum Match 100% 
Listing first 45 summaries 

Database : Issued_Patents_NA: * 

1 : /cgn2_6/ptodata/2/ina/5A_COMB.seq: * 

2 : /cgn2_6/ptodata/2/ina/5B_COMB. seq: * 

3 : /cgn2_6/ptodata/2/ina/6A_COMB . seq: * 

4 : /cgn2_6/ptodata/2/ina/6B_COMB.seq: * 

5: /cgn2_6/ptodata/2/ina/PCTUS_COMB.seq:* 

6 : / cgn2_6/ptodata/2/ina/backf iles 1 . seq : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Result 
No. 


Score 


Query 
Match 


Length 


DB 


ID 








Description 


1 


74 


2.1 


664 


4 


us- 


-09- 


-904- 


-615-66 


Sequence 


66, Appl 


2 


73.8 


2.1 


1091 


4 


us- 


-09- 


■328- 


-965-1 


Sequence 


1, Appli 


c 3 


73.4 


2.1 


4055 


4 


us- 


■09- 


-620- 


-312D-706 


Sequence 


706, App 


4 


71.2 


2.1 


1701 


4 


us- 


■09- 


-996- 


-243-114 


Sequence 


114, App 


5 


69.2 


2.0 


1441 


3 


us- 


-08- 


■821- 


-994-63 


Sequence 


63, Appl 


6 


69.2 


2.0 


2246 


4 


us- 


-09- 


■363- 


-708-3 


Sequence 


3, Appli 


7 


69.2 


2.0 


2246 


4 


us- 


-09- 


-083- 


-587-3 


Sequence 


3, Appli 


8 


69.2 


2.0 


2406 


4 


us- 


-09- 


-594- 


•506-37 


Sequence 


37, Appl 


9 


69 


2.0 


2202 


4 


us- 


•09- 


•465- 


-558-59 


Sequence 


59, Appl 


10 


68.8 


2.0 


1147 


1 


us- 


08- 


•665- 


-716-1 


Sequence 


1, Appli 


11 


68.6 


2.0 


1736 


3 


us- 


•09- 


•182- 


-816-22 


Sequence 


22, Appl 



c 


12 


68.6 


2. 


0 


1736 


3 


US-09-182-816-24 


Sequence 


24, Appl 




13 


68.6 


2. 


0 


1736 


3 


US-09-471-528-22 


Sequence 


22, Appl 


c 


14 


68.6 


2. 


0 


1736 


3 


US-09-471-528-24 


Sequence 


24, Appl 




15 


68.6 


2. 


0 


1736 


3 


US-09-634-530-22 


Sequence 


22, Appl 


c 


16 


68.6 


2. 


0 


1736 


3 


US-09-634-530-24 


Sequence 


24, Appl 




17 


68.4 


2. 


0 


593 


4 


US-09-904-615-59 


Sequence 


59, Appl 




18 


68.4 


2. 


0 


2323 


4 


US-09-149-476-24 


Sequence 


24, Appl 




19 


68.4 


2. 


0 


2806 


4 


US-09-653-839-9 


Sequence 


9, Appli 




20 


68.4 


2. 


0 


3848 


3 


US-09-112-096-28 


Sequence 


28, Appl 




21 


68.4 


2. 


0 


5668 


3 


US-09-112-096-14 


Sequence 


14, Appl 




22 


68 


2. 


0 


1098 


3 


US-09-248-335-35 


Sequence 


35, Appl 




23 


68 


2. 


0 


2447 


2 


US-09-014-969-14 


Sequence 


14, Appl 




24 


67.6 


1. 


9 


5503 


2 


US-08-726-012B-1 


Sequence 


1, Appli 


c 


25 


67.4 


1. 


9 


260 


2 


US-08-520-678A-29 


Sequence 


29, Appl 


c 


26 


67.4 


1. 


9 


260 


3 


US-08-897-126-29 


Sequence 


29, Appl 




27 


67.4 


1. 


9 


746 


3 


US-09-013-810-1 


Sequence 


1, Appli 




28 


67.2 


1. 


9 


1445 


4 


US-09-814-951A-1 


Sequence 


1, Appli 




29 


67.2 


1. 


9 


2320 


4 


US-09-202-904A-13 


Sequence 


13, Appl 




30 


67 


1. 


9 


3275 


4 


US-09-370-838-151 


Sequence 


151, App 




31 


66.8 


1. 


9 


2852 


3 


US-09-027-137-2 


Sequence 


2, Appli 




32 


66.8 


1. 


9 


2852 


4 


US-09-344-441-2 


Sequence 


2, Appli 




33 


66.8 


1. 


9 


3238 


3 


US-08-123-934A-5 


Sequence 


5, Appli 




34 


66.8 


1. 


9 


3238 


5 


PCT-US94-10080-5 


Sequence 


5, Appli 




35 


66.6 


1. 


9 


1507 


4 


US-09-453-323-1 


Sequence 


1, Appli 




36 


66.6 


1. 


9 


3334 


4 


US-09-996-243-288 


Sequence 


288, App 




37 


66.6 


1. 


9 


5173 


1 


US-08-242-677-1 


Sequence 


1, Appli 




38 


66.4 


1. 


9 


1249 


4 


US-09-461-325-128 


Sequence 


128, App 


c 


39 


66.4 


1. 


9 


1260 


4 


US-09-461-325-93 


Sequence 


93, Appl 




40 


66.4 


1. 


9 


2665 


4 


US-08-971-089-5 


Sequence 


5, Appli 




41 


66.4 


1. 


9 


2718 


4 


US-09-667-135-1 


Sequence 


1, Appli 




42 


66.4 


1. 


9 


2773 


4 


US-09-996-243-178 


Sequence 


178, App 




43 


66.2 


1. 


9 


1100 


3 


US-07-861-458C-4 


Sequence 


4, Appli 




44 


66.2 


1. 


9 


2136 


4 


US-09-996-243-302 


Sequence 


302, App 




45 


66.2 


1. 


9 


2218 


4 


US-09-016-434-1157 


Sequence 


1157, Ap 



ALIGNMENTS 



RESULT 1 

US-09-904-615-66 

; Sequence 66, Application US/09904615 

; Patent No. 6566325 

; GENERAL INFORMATION: 

; APPLICANT: Rosen et al . 

; TITLE OF INVENTION: 49 Human Secreted Proteins 

; FILE REFERENCE: PZ032P1 

; CURRENT APPLICATION NUMBER: US/09/904,615 

; CURRENT FILING DATE: 2001-07-16 

; PRIOR APPLICATION NUMBER: 09/511,554 

; PRIOR FILING DATE: 2000-02-23 

; PRIOR APPLICATION NUMBER: 60/097,917 

; PRIOR FILING DATE: 1998-08-25 

; PRIOR APPLICATION NUMBER: 60/098,634 

; PRIOR FILING DATE: 1998-08-31 

; NUMBER OF SEQ ID NOS : 170 

; SOFTWARE: Patentln Ver. 2.0 



SEQ ID NO 66 
LENGTH: 664 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 
NAME/KEY: SITE 
LOCATION: (31) 

OTHER INFORMATION: n equals a,t,g, or c 
NAME/KEY: SITE 
LOCATION: (63) 

OTHER INFORMATION: n equals a,t,g, or c 
US-09-904-615-66 

Query Match 2.1%; Score 74; DB 4; Length 664; 

Best Local Similarity 64.0%; Pred. No. 2.3e-08; 

Matches 110; Conservative 1; Mismatches 61; Indels 0; Gaps 0; 

Qy 3297 CAAAAAATAATTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCG 3356 

III III II I I I I I II I I I I I I I I I I I I 

Db 487 CATAGTGTAAAAATTTATATTATTGTGAGGTTTTTTGTCTTTTTTTTTTTTTTTTTTTTT 54 6 

Qy 3357 T GT GT GT GC AT GT GT GT AT GT GT AT CACAGGTAATAAAGGCAAT T GGAT GATTAAAAAAA 3416 

I I I I I I I I I I I I I I I I I I I I I II : I II I I I I I 
Db 547 GGTATATTGCTGTATCTACTTTAACTTCCAGAAATAAACGTTATATRGGAAAAAAAAAAA 606 

Qy 3417 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 607 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 658 



RESULT 2 
US-09-328-965-1 

; Sequence 1, Application US/09328965 
; Patent No. 6501008 
; GENERAL INFORMATION: 

APPLICANT: Nevins, Donald J. 
; APPLICANT: Simmons, Carl 

; APPLICANT: The Regents of the University of California 

TITLE OF INVENTION: Endo- and Exo-Glucanases and Gene 
; FILE REFERENCE: 02307O-096600US 
; CURRENT APPLICATION NUMBER: US/09/328,965 
; CURRENT FILING DATE: 1999-06-09 
; EARLIER APPLICATION NUMBER: US 60/088,780 
; EARLIER FILING DATE: 1998-06-10 
; NUMBER OF SEQ ID NOS : 3 
; SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 1 

LENGTH: 1091 

TYPE: DNA 

ORGANISM: Zea mays 

FEATURE : 

; OTHER INFORMATION: maize coleoptile endo-1, 3 ; 1, 4-beta glucanase cDNA 
; FEATURE : 

NAME/ KEY: CDS 

LOCATION: (68).. (979) 
; OTHER INFORMATION: endo-1, 3 ; 1, 4-beta glucanase 
US-09-328-965-1 



Query Match 2.1%; Score 73.8; DB 4; Length 1091; 

Best Local Similarity 65.5%; Pred. No. 3.2e-08; 

Matches 108; Conservative 0; Mismatches 57; Indels 0; Gaps 0; 



Qy 3304 TAATTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGT 3363 

I I I I I I I I I I I I I I III I I I I I I I I I I I I I 

Db 918 TAATTTCCCTCATTTTTTTTGTCTCTATGTATTTCTTTTCTTTTCTTTTTGCTTTTTTAT 977 

Qy 3364 GC AT GT GT GT AT GT GT AT C ACAGGTAAT AAAGGCAATT GGAT GAT TAAAAAAAAAAAAAA 3423 

I I III I I II I I I I I I II I I I I I I I I I I I I I I I I 

Db 978 GAT CGCAATAAAGTTCAGTAGGGGTAAAAAAAAAAAAAAAAAAAAAAAAAAA7WWVAAA 1037 

Qy 3424 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1038 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1082 



RESULT 3 

US-09-620-312D-706/c 

; Sequence 706, Application US/09620312D 
; Patent No. 6569662 
; GENERAL INFORMATION: 



APPLICANT 


Tang, 


Y. Tom 


APPLICANT 


Liu, 


Chenghua 


APPLICANT 


Asundi, Vinod 


APPLICANT 


Zhang 


, Jie 


APPLICANT 


Ren, 


Feiyan 


APPLICANT 


Chen, 


Rui-hong 


APPLICANT 


Zhao, 


Qing A. 


APPLICANT 


Wehrman, Tom 


APPLICANT 


Xue, 


Aidong J. 


APPLICANT 


Yang, 


Yonghong 


APPLICANT 


Wang, 


Jian-Rui 


APPLICANT 


Zhou, 


Ping 


APPLICANT 


Ma, Yunqing 


APPLICANT 


: Wang, 


Dunrui 


APPLICANT 


: Wang, 


Zhiwei 


APPLICANT 


: John 


Tillinghast 


APPLICANT 


: Drmanac, Radoje T. 



TITLE OF INVENTION: No. 6569662el Nucleic Acids and 
TITLE OF INVENTION: Polypeptides 
FILE REFERENCE: 784CIP2B 

CURRENT APPLICATION NUMBER: US/09/620, 312D 
CURRENT FILING DATE: 2000-07-19 
PRIOR APPLICATION NUMBER: 09/552,317 
PRIOR FILING DATE: 2000-04-25 
PRIOR APPLICATION NUMBER: 09/488,725 
PRIOR FILING DATE: 2000-01-21 
NUMBER OF SEQ ID NOS : 1105 
SOFTWARE: pt_FL_genes Version 1.0 
SEQ ID NO 706 
LENGTH: 4055 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 
NAME/ KEY: CDS 



LOCATION : (2515 )..( 3519 ) 
US-09-620-312D-706 



Query Match 2.1%; Score 73.4; DB 4; Length 4055; 

Best Local Similarity 58.4%; Pred. No. 6.9e-08; 

Matches 128; Conservative 0; Mismatches 91; Indels 0; Gaps 0; 
Qy 3250 AAAGAC CAGT TT T AT T T T C AGCATT C CT CAT GCAT T T C AGT GGTAACCAAAAAATAATT T 3309 

III II I I I I I I I I I II I I II I I I I Mill 

Db 500 AAAAGCAAAAT GT GT TT T C AGATTT GT T ACT TTAATAAAGGTT AT CCAT AC CAATAAAAA 441 

Qy 3310 GTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGT 3369 

Mil III I I I II I I I M I II I I I I I 

Db 440 GT GT ACAACACAGCAT T T T CT GTTAAAT TAT T ATT GGT TTT CAGTT GT AATT T GGT ATT T 381 

Qy 3370 GT GT AT GT GTAT CACAGGTAATAAAGGCAATTGGAT GATTAAAAAAAAAAAAAAAAAAAA 3429 

I I I I I I I I I I M II I I I I M I II I M I II I I M I I 

Db 380 TTTCTGGCATGCGTTTATTAATTTATTAAATTGGCTTTTAGAAAAAAAAAA7VAAAAAAAA 321 

Qy 3430 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

M I II I I I I M II I M I II I I II I I M II I I I I I I I II I 
Db 320 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 282 



RESULT 4 

US-09-996-243-114 

Sequence 114, Application US/09996243 
Patent No. 6478825 
GENERAL INFORMATION: 
APPLICANT: Ashkenazi , Avi J. 
APPLICANT: Baker, Kevin P. 
APPLICANT : Botstein, David 
APPLICANT : Desnoyers , Luc 
APPLICANT: Eaton, Dan L. 
APPLICANT : Ferrara, Napoleone 
APPLICANT : Fong, Sherman 
APPLICANT : Gerber, Hanspeter 
APPLICANT: Gerritsen,Mary E. 
APPLICANT : Goddard, Audrey 
APPLICANT: Godowski , Paul J. 
APPLICANT: Grimaldi, J. Christopher 
APPLICANT: Gurney, Austin L. 
APPLICANT: Kljavin,Ivar J. 
APPLICANT: Napier, Mary A. 
APPLICANT: Pan, James 
APPLICANT: Paoni , Nicholas F. 
APPLICANT: Roy, Margaret Ann 
APPLICANT: Stewart, Timothy A. 
APPLICANT: Tumas, Daniel 
APPLICANT: Watanabe, Colin K. 
APPLICANT: Williams, P. Mickey 
APPLICANT: Wood, William I. 
APPLICANT: Zhang, Zemin 

TITLE OF INVENTION: Secreted and Transmembrane Polypeptides and Nucleic 
TITLE OF INVENTION: Acids Encoding the Same 
FILE REFERENCE: P2730P1C13 

CURRENT APPLICATION NUMBER: US/09/996,243 



CURRENT FILING DATE: 2001-11-14 
PRIOR APPLICATION NUMBER: 60/049787 
PRIOR FILING DATE: 1997-06-16 
PRIOR APPLICATION NUMBER: 60/062250 
PRIOR FILING DATE: 1997-10-17 
PRIOR APPLICATION NUMBER: 60/065186 
PRIOR FILING DATE: 1997-11-12 
PRIOR APPLICATION NUMBER: 60/065311 
PRIOR FILING DATE: 1997-11-13 
PRIOR APPLICATION NUMBER: 60/066770 
PRIOR FILING DATE: 1997-11-24 
PRIOR APPLICATION NUMBER: 60/075945 
PRIOR FILING DATE: 1998-02-25 
PRIOR APPLICATION NUMBER: 60/078910 
PRIOR FILING DATE: 1998-03-20 
PRIOR APPLICATION NUMBER: 60/083322 
PRIOR FILING DATE: 1998-04-28 
PRIOR APPLICATION NUMBER: 60/084600 
PRIOR FILING DATE: 1998-05-07 
PRIOR APPLICATION NUMBER: 60/087106 
PRIOR FILING DATE: 1998-05-28 
PRIOR APPLICATION NUMBER: 60/087607 
PRIOR FILING DATE: 1998-06-02 
PRIOR APPLICATION NUMBER: 60/087609 
PRIOR FILING DATE: 1998-06-02 
PRIOR APPLICATION NUMBER: 60/087759 
PRIOR FILING DATE: 1998-06-02 
PRIOR APPLICATION NUMBER: 60/087827 
PRIOR FILING DATE: 1998-06-03 
PRIOR APPLICATION NUMBER: 60/088021 
PRIOR FILING DATE: 1998-06-04 
PRIOR APPLICATION NUMBER: 60/088025 
PRIOR FILING DATE: 1998-06-04 
PRIOR APPLICATION NUMBER: 60/088026 
PRIOR FILING DATE: 1998-06-04 
PRIOR APPLICATION NUMBER: 60/088028 
PRIOR FILING DATE: 1998-06-04 
PRIOR APPLICATION NUMBER: 60/088029 
PRIOR FILING DATE: 1998-06-04 
PRIOR APPLICATION NUMBER: 60/088030 
PRIOR FILING DATE: 1998-06-04 
PRIOR APPLICATION NUMBER: 60/088033 
PRIOR FILING DATE: 1998-06-04 
PRIOR APPLICATION NUMBER: 60/088326 
PRIOR FILING DATE: 1998-06-04 
PRIOR APPLICATION NUMBER: 60/088167 
PRIOR FILING DATE: 1998-06-05 
PRIOR APPLICATION NUMBER: 60/088202 
PRIOR FILING DATE: 1998-06-05 
PRIOR APPLICATION NUMBER: 60/088212 
PRIOR FILING DATE: 1998-06-05 
PRIOR APPLICATION NUMBER: 60/088217 
PRIOR FILING DATE: 1998-06-05 
PRIOR APPLICATION NUMBER: 60/088655 
PRIOR FILING DATE: 1998-06-09 
PRIOR APPLICATION NUMBER: 60/088734 
PRIOR FILING DATE: 1998-06-10 



PRIOR APPLICATION NUMBER: 60/088738 
PRIOR FILING DATE: 1998-06-10 
PRIOR APPLICATION NUMBER: 60/088742 
PRIOR FILING DATE: 1998-06-10 
PRIOR APPLICATION NUMBER: 60/088810 
PRIOR FILING DATE: 1998-06-10 
PRIOR APPLICATION NUMBER: 60/088824 
PRIOR FILING DATE: 1998-06-10 
PRIOR APPLICATION NUMBER: 60/088826 
PRIOR FILING DATE: 1998-06-10 
PRIOR APPLICATION NUMBER: 60/088858 
PRIOR FILING DATE: 1998-06-11 
PRIOR APPLICATION NUMBER: 60/088861 
PRIOR FILING DATE: 1998-06-11 
PRIOR APPLICATION NUMBER: 60/088876 
PRIOR FILING DATE: 1998-06-11 
PRIOR APPLICATION NUMBER: 60/089105 
PRIOR FILING DATE: 1998-06-12 
PRIOR APPLICATION NUMBER: 60/089440 
PRIOR FILING DATE: 1998-06-16 
PRIOR APPLICATION NUMBER: 60/089512 
PRIOR FILING DATE: 1998-06-16 
PRIOR APPLICATION NUMBER: 60/089514 
PRIOR FILING DATE: 1998-06-16 
PRIOR APPLICATION NUMBER: 60/089532 
PRIOR FILING DATE: 1998-06-17 
PRIOR APPLICATION NUMBER: 60/089538 
PRIOR FILING DATE: 1998-06-17 
PRIOR APPLICATION NUMBER: 60/089598 
PRIOR FILING DATE: 1998-06-17 
PRIOR APPLICATION NUMBER: 60/089599 
PRIOR FILING DATE: 1998-06-17 
PRIOR APPLICATION NUMBER: 60/089600 
PRIOR FILING DATE: 1998-06-17 
PRIOR APPLICATION NUMBER: 60/089653 
PRIOR FILING DATE: 1998-06-17 
PRIOR APPLICATION NUMBER: 60/089801 
PRIOR FILING DATE: 1998-06-18 
PRIOR APPLICATION NUMBER: 60/089907 
PRIOR FILING DATE: 1998-06-18 
PRIOR APPLICATION NUMBER: 60/089908 
PRIOR FILING DATE: 1998-06-18 
PRIOR APPLICATION NUMBER: 60/089947 
PRIOR FILING DATE: 1998-06-19 
PRIOR APPLICATION NUMBER: 60/089948 
PRIOR FILING DATE: 1998-06-19 
PRIOR APPLICATION NUMBER: 60/089952 
PRIOR FILING DATE: 1998-06-19 
PRIOR APPLICATION NUMBER: 60/090246 
PRIOR FILING DATE: 1998-06-22 
PRIOR APPLICATION NUMBER: 60/090252 
PRIOR FILING DATE: 1998-06-22 
PRIOR APPLICATION NUMBER: 60/090254 
PRIOR FILING DATE: 1998-06-22 
PRIOR APPLICATION NUMBER: 60/090349 
PRIOR FILING DATE: 1998-06-23 
PRIOR APPLICATION NUMBER: 60/090355 



PRIOR FILING DATE: 1998-06-23 
PRIOR APPLICATION NUMBER: 60/090429 
PRIOR FILING DATE: 1998-06-24 
PRIOR APPLICATION NUMBER: 60/090431 
PRIOR FILING DATE: 1998-06-24 
PRIOR APPLICATION NUMBER: 60/090435 
PRIOR FILING DATE: 1998-06-24 
PRIOR APPLICATION NUMBER: 60/090444 
PRIOR FILING DATE: 1998-06-24 
PRIOR APPLICATION NUMBER: 60/090445 
PRIOR FILING DATE: 1998-06-24 
PRIOR APPLICATION NUMBER: 60/090472 
PRIOR FILING DATE: 1998-06-24 
PRIOR APPLICATION NUMBER: 60/090535 
PRIOR FILING DATE: 1998-06-24 
PRIOR APPLICATION NUMBER: 60/090540 
PRIOR FILING DATE: 1998-06-24 
PRIOR APPLICATION NUMBER: 60/090542 
PRIOR FILING DATE: 1998-06-24 
PRIOR APPLICATION NUMBER: 60/090557 
PRIOR FILING DATE: 1998-06-24 
PRIOR APPLICATION NUMBER: 60/090676 
PRIOR FILING DATE: 1998-06-25 
PRIOR APPLICATION NUMBER: 60/090678 
PRIOR FILING DATE: 1998-06-25 
PRIOR APPLICATION NUMBER: 60/090690 
PRIOR FILING DATE: 1998-06-25 
PRIOR APPLICATION NUMBER: 60/090694 
PRIOR FILING DATE: 1998-06-25 
PRIOR APPLICATION NUMBER: 60/090695 
PRIOR FILING DATE: 1998-06-25 
PRIOR APPLICATION NUMBER: 60/090696 
PRIOR FILING DATE: 1998-06-25 
PRIOR APPLICATION NUMBER: 60/090862 
PRIOR FILING DATE: 1998-06-26 
PRIOR APPLICATION NUMBER: 60/090863 
PRIOR FILING DATE: 1998-06-26 
PRIOR APPLICATION NUMBER: 60/091360 
PRIOR FILING DATE: 1998-07-01 
PRIOR APPLICATION NUMBER: 60/091478 
PRIOR FILING DATE: 1998-07-02 
PRIOR APPLICATION NUMBER: 60/091544 
PRIOR FILING DATE: 1998-07-01 
PRIOR APPLICATION NUMBER: 60/091519 
PRIOR FILING DATE: 1998-07-02 
PRIOR APPLICATION NUMBER: 60/091626 
PRIOR FILING DATE: 1998-07-02 
PRIOR APPLICATION NUMBER: 60/091633 
PRIOR FILING DATE: 1998-07-02 
PRIOR APPLICATION NUMBER: 60/091978 
PRIOR FILING DATE: 1998-07-07 
PRIOR APPLICATION NUMBER: 60/091982 
PRIOR FILING DATE: 1998-07-07 
PRIOR APPLICATION NUMBER: 60/092182 
PRIOR FILING DATE: 1998-07-09 

Query Match 2.1%; Score 71.2; DB 4; Length 1701; 



Best Local Similarity 73.4%; Pred. No. 1.6e-07; 

Matches 91; Conservative 0; Mismatches 33; Indels 0; Gaps 0; 

Qy 3345 TGTTTTATTGCGTGTGTGTGCATGTGTGTATGTGTATCACAGGTAATAAAGGCAATTGGA 3404 

I I I I I I I I I I I I I I I III I I I I I I I III 

Db 1563 TTTGTTACTTTTTCTTTGCTAATTTGGAAGATTAACTCATTTTTAATAAAATTATGTCTA 1622 

Qy 3405 T GATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 34 64 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1623 AGAT T AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1682 

Qy 3465 AAAA 3468 

I I I I 

Db 1683 AAAA 1686 



RESULT 5 

US-08-821-994-63 

; Sequence 63, Application US/08821994A 

; Patent No. 6228643 

; GENERAL INFORMATION: 

; APPLICANT: Greenland, Andrew J 

; APPLICANT: Thomas, Didier RP 

APPLICANT: Jepson, Ian 
; TITLE OF INVENTION: Promoters 
; FILE REFERENCE: PPD 50108 

; CURRENT APPLICATION NUMBER: US/08/821, 994A 

; CURRENT FILING DATE: 1997-03-22 

; EARLIER APPLICATION NUMBER: PCT/GB97/ 00729 

; EARLIER FILING DATE: 1997-03-18 

; EARLIER APPLICATION NUMBER: GB 9606062.9 

; EARLIER FILING DATE: 1996-03-22 

; NUMBER OF SEQ ID NOS : 89 

; SOFTWARE: Patent In Ver. 2.1 

; SEQ ID NO 63 

LENGTH: 1441 

TYPE: DNA 
; ORGANISM: Brassica napus 
US-08-821-994-63 

Query Match 2.0%; Score 69.2; DB 3; Length 1441; 

Best Local Similarity 53.8%; Pred. No. 4.5e-07; 

Matches 143; Conservative 0; Mismatches 123; Indels 0; Gaps 0; 



Qy 3203 CTAAGGTACCAATAGCT CTTT CATAGACTT GT GCTACAAGAAGGTTAAAAGACCAGTTTT 3262 

I I I I I I I I I III I I I I I I I I I I I III 

Db 1169 CT CAT GC AGT AAT CAAATT GGGATT GT T ATAAGT TAAATTAAT CTT GT AT TAT T GTTT GT 1228 

Qy 3263 ATTTT CAGC AT T CCT C AT GC ATTT C AGT GGT AAC CAAAAAATAAT T T GT CAATTAAT AGT 3322 

I I I II I I I I I I I I I I I I I I I I I I I I I 

Db 1229 AT GT AT AGT AT TT C GAAAAAAATT GAT TC ACC AT AGG GATTTAAT CT GT AT AAAT CT CT A 1288 

Qy 3323 TGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTATGTGTATC 3382 

I I I I I II I I I I I II III I I 

Db 1289 T GTT GGT CAAT AT C ATTT CAT T CAAAGAAT AT TT GCT TT GGCTT GATT AT GT ATT AAGAG 134 8 

Qy 3383 ACAGGTAATAAAGGCAATT GGATGATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3442 




Db 



1349 



AAATATAATAAAAAT GATAT ATTTCT CAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 14 08 



Qy 



Db 



3443 



1409 



AAAAAAAAAAAAAAAAAAAAAAAAAA 3468 
I I I I I I I I I I I I I I I I I I I I I I I I I I 
AAAAAAAAAAAAAAAAAAAAAAAAAA 1434 



RESULT 6 
US-09-363-708-3 

; Sequence 3, Application US/09363708 
; Patent No. 6399747 
; GENERAL INFORMATION: 

APPLICANT: Schmandt, et al. 

TITLE OF INVENTION: NOVEL SHC BINDING PROTEIN 

NUMBER OF SEQUENCES: 12 

CORRESPONDENCE ADDRESS: 

ADDRESSEE: Marshall, O'Toole, Gerstein, Murray & Borun 
; STREET: 233 South Wacker Drive/6300 Sears Tower 

; CITY: Chicago 

; STATE: Illinois 

; COUNTRY: United States of America 

; ZIP: 60606-6402 

; COMPUTER READABLE FORM: 

; MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 

OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 09/363, 708 
; FILING DATE: 

CLASSIFICATION: 
; ATTORNEY/ AGENT INFORMATION: 
; NAME: Clough, David W. 

; REGISTRATION NUMBER: 36,107 

; REFERENCE/ DOCKET NUMBER: 01017/34451 

; TELECOMMUNICATION INFORMATION: 

TELEPHONE: (312) 474-6300 

TELEFAX: (312) 474-0448 
; INFORMATION FOR SEQ ID NO: 3: 

SEQUENCE CHARACTERISTICS: 
; LENGTH: 224 6 base pairs 

; TYPE: nucleic acid 

STRANDEDNESS: single 
; TOPOLOGY: linear 

MOLECULE TYPE: cDNA 

DESCRIPTION: /desc = "mouse PAL cDNA" 
US-09-363-708-3 

Query Match 2.0%; Score 69.2; DB 4; Length 2246; 

Best Local Similarity 64.2%; Pred. No. 5.4e-07; 

Matches 104; Conservative 0; Mismatches 58; Indels 0; Gaps 0; 

Qy 3307 TTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCA 3366 

II II I II III I I II II II I I I I I I I I I I 

Db 2054 TTGAACATGTCTTAAGTATGCTGCTTATATACTTTGCTTCATTTGCTTCATGGCTGTGTA 2113 



Qy 3367 T GT GT GT AT GT GT AT C ACAGGTAATAAAGGCAAT T GGAT GAT T AAAAAAAAAAAAAAAAA 3426 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 

Db 2114 T T AT ATAAAGT GT ACT T GACCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2173 

Qy 3427 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 2174 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2215 



RESULT 7 
US-09-083-587-3 

; Sequence 3, Application US/09083587 
; Patent No. 6492138 
; GENERAL INFORMATION: 

APPLICANT: Schmandt, et al . 

TITLE OF INVENTION: NOVEL SHC BINDING PROTEIN 
NUMBER OF SEQUENCES: 12 
CORRESPONDENCE ADDRESS: 
; ADDRESSEE: Marshall, O' Toole, Gerstein, Murray & Borun 

STREET: 233 South Wacker Drive/6300 Sears Tower 
; CITY: Chicago 

STATE: Illinois 

COUNTRY: United States of America 
ZIP: 60606-6402 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
; OPERATING SYSTEM: PC-DOS/MS-DOS 

; SOFTWARE: Patentln Release #1.0, Version #1.30 

CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/083, 587 
FILING DATE: 
; CLASSIFICATION: 

ATTORNEY/AGENT INFORMATION: 
NAME: Clough, David W. 
; REGISTRATION NUMBER: 36,107 

REFERENCE/ DOCKET NUMBER: 01017/34451 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (312) 474-6300 
TELEFAX: (312) 474-0448 
; INFORMATION FOR SEQ ID NO: 3: 

SEQUENCE CHARACTERISTICS: 
; LENGTH: 2246 base pairs 

; TYPE: nucleic acid 

; STRANDEDNESS: single 

; TOPOLOGY: linear 

MOLECULE TYPE: cDNA 
; DESCRIPTION: /desc = "mouse PAL cDNA" 

US-09-083-587-3 

Query Match 2.0%; Score 69.2; DB 4; Length 2246; 

Best Local Similarity 64.2%; Pred. No. 5.4e-07; 

Matches 104; Conservative 0; Mismatches 58; Indels 0; Gaps 0; 

Qy 3307 TTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCA 3366 

II II I II III I I || II III I I I I I I I I I 

Db 2054 TTGAACATGTCTTAAGTATGCTGCTTATATACTTTGCTTCATTTGCTTCATGGCTGTGTA 2113 



Qy 3367 T GT GT GT AT GT GT AT CAC AGGTAATAAAGGCAAT T GGAT GAT TAAAAAAAAAAAAAAAAA 3426 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 

Db 2114 T T AT AT AAAGT GT ACT T GAC C AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2173 

Qy 3427 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 2174 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2215 



RESULT 8 ' 
US-09-594-506-37 

; Sequence 37, Application US/09594506 

; Patent No. 6512164 

; GENERAL INFORMATION: 

; APPLICANT: Famodu, Omolayo O. 

; APPLICANT: Rafalski, J. Antoni 

; TITLE OF INVENTION: Thiamine Biosynthetic Enzymes 

; FILE REFERENCE: BB1372 US NA 

; CURRENT APPLICATION NUMBER: US/09/594,506 

; CURRENT FILING DATE: 2000-06-15 

; PRIOR APPLICATION NUMBER: 60/139,556 

; PRIOR FILING DATE: 1999-06-16 

; NUMBER OF SEQ ID NOS : 45 

; SOFTWARE: Microsoft Office 97 

; SEQ ID NO 37 

; LENGTH: 2406 

; TYPE: DNA 

ORGANISM: Triticum aestivum 
US-09-594-506-37 

Query Match 2.0%; Score 69.2; DB 4; Length 2406; 

Best Local Similarity 56.6%; Pred. No. 5.6e-07; 

Matches 128; Conservative 0; Mismatches 98; Indels 0; Gaps 0; 

Qy 3243 AAGGTTAAAAGAC CAGTTTT ATTTT CAGC AT T C CT C AT GC AT TT C AGT GGTAACCAAAAA 3302 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 2179 AACT T CAT AAGCCCATAAT TT TTT T GAGGAAT C CC ATT ACAT CT C GC AAAGCAT T C ACAA 2238 

Qy 3303 ATAATTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTG 3362 

I Mill II III I I I I I I I II 

Db 2239 T GT C CT GT GT AATTT ACTT TT T AC ACCT AT C CT T GT AC AT AT TT CT AT ATAAGT AGAAT A 2298 

Qy 3363 T GCAT GT GT GT AT GT GTAT CAC AGGT AATAAAGGCAATT GGAT GATTAAAAAAAAAAAAA 3422 

I II I I I I I I I I I I I I II I I I I I I I I I I I I I 

Db 2299 T AAAAGAT GTAACT AGAT T GAC AGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAGCAAAAA 2358 

Qy 3423 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2359 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2404 



RESULT 9 

US-09-465-558-59 

; Sequence 59, Application US/09465558 
; Patent No. 6436657 
; GENERAL INFORMATION: 



; APPLICANT: Morakinyo, Layo 0. 
; APPLICANT: Orozco Jr, Emil M. 

; TITLE OF INVENTION: T ET RAH YDRO FOLATE METABOLIC ENZYMES 

; FILE REFERENCE: BB1322 US NA 

; CURRENT APPLICATION NUMBER: US/09/465,558 

; CURRENT FILING DATE: 1999-12-17 

; EARLIER APPLICATION NUMBER: 60/112,734 

; EARLIER FILING DATE: 1998-12-18 

; NUMBER OF SEQ ID NOS : 70 

; SOFTWARE: Microsoft Office 97 

; SEQ ID NO 59 

LENGTH: 2202 

TYPE: DNA 
; ORGANISM: Glycine max 
US-09-465-558-59 

Query Match 2.0%; Score 69; DB 4; Length 2202; 

Best Local Similarity 69.9%; Pred. No. 6e-07; 

Matches 93; Conservative 0; Mismatches 40; Indels 0; Gaps 0; 

Qy 3336 CT CCTAATT T GT TT T AT T GC GT GT GT GTGC AT GT GT GT AT GT GT AT C AC AGGT AAT AAAG 3395 

I I I I I I I I I I I I III II I III I I II I I I I I 

Db 2064 CTCCTTGTTTGTTTGCGTGCTTGGTGATCTGTATGAATGAAATAAATACGTGATTTAAGG 2123 

Qy 3396 G C AAT T GGAT GAT T AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3455 

II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2124 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2183 

Qy 3456 AAAAAAAAAAAAA 3468 

I I I I I I I I I I I I I 
Db 2184 AAAAAAAAAAAAA 2196 



RESULT 10 
US-08-665-716-1 

; Sequence 1, Application US/08665716 

; Patent No. 5789222 

; GENERAL INFORMATION: 

; APPLICANT: KELLY, ROSEMARIE 

APPLICANT: REGISTER, ELIZABETH A 
; APPLICANT: MASUREKAR, PRAKASH S 

; TITLE OF INVENTION: P5C REDUCTASE GENE FROM ZALERION 
TITLE OF INVENTION: ARBORICOLA 
NUMBER OF SEQUENCES: 2 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: MERCK & CO., INC. 
STREET: 126 E. LINCOLN AVENUE 
CITY: RAHWAY 
STATE: NEW JERSEY 
COUNTRY : US 
ZIP: 07065-0900 
; COMPUTER READABLE FORM: 

; MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 



APPLICATION NUMBER: US/08/665,716 
FILING DATE: 23-JUN-1995 
CLASSIFICATION: 
ATTORNEY/ AGENT INFORMATION: 
NAME: KORSEN, ELLIOTT 
REGISTRATION NUMBER: 32,705 
REFERENCE/ DOCKET NUMBER: 19453PV 
* TELECOMMUNICATION INFORMATION: 
TELEPHONE: 908-594-5493 
TELEFAX: 908-594-4720 
INFORMATION FOR SEQ ID NO: 1: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1147 base pairs 
TYPE: nucleic acid 
STRANDEDNESS : double 
TOPOLOGY: linear 
MOLECULE TYPE: cDNA 
HYPOTHETICAL: NO 
ANTI-SENSE : NO 
FEATURE: 

NAME/ KEY: CDS 
LOCATION: 47.. 960 
US-08-665-716-1 

Query Match 2.0%; Score 68.8; DB 1; Length 1147; 

Best Local Similarity 75.9%; Pred. No. 5.1e-07; 

Matches 85; Conservative 0; Mismatches 27; Indels 0; Gaps 0; 

Qy 3357 TGT GTGTGCATGT GTGT AT GT GT AT CACAGGTAATAAAGGCAATT GGATGATTAAAAAAA 3416 

I I I I II III II II I I I I I I I I II I I I I I I I I I I 

Db 1021 T GC GT AGACACAT GT CCAAGGAGT T CT GGGGT ATAAAAAGTT GTT CAT TT AT GAAAAAAA 1080 

Qy 3417 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^ 3468 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1081 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1132 



RESULT 11 
US-09-182-816-22 

; Sequence 22, Application US/09182816 

; Patent No. 6143542 

; GENERAL INFORMATION: 

; APPLICANT: Wisnewski, Nancy 

; APPLICANT: Silver, Gary M. 

; APPLICANT: Lo, Katherine C. 

; APPLICANT: Brandt, Kevin S. 

; TITLE OF INVENTION: NOVEL FLEA EPOXIDE HYDROLASE NUCLEIC ACID MOLECULES, 
; TITLE OF INVENTION: PROTEINS AND USES THEREOF 
; FILE REFERENCE: FC-3-C1 

; CURRENT APPLICATION NUMBER: US/09/182,816 

; CURRENT FILING DATE: 1998-10-29 

; EARLIER APPLICATION NUMBER: 08/989,510 

; EARLIER FILING DATE: 1997-12-12 

; NUMBER OF SEQ ID NOS : 31 

; SOFTWARE: Patent In Ver. 2.0 

; SEQ ID NO 22 

; LENGTH: 1736 



TYPE: DNA 
; ORGANISM: Ctenocephalides felis 

FEATURE : 

NAME/ KEY: CDS 
; LOCATION: (159) (1553) 
US-09-182-816-22 

Query Match 2.0%; Score 68.6; DB 3; Length 1736; 

Best Local Similarity 62.6%; Pred. No. 6.8e-07; 

Matches 107; Conservative 0; Mismatches 64; Indels 0; Gaps 0; 

Qy 3298 AAAAAATAATTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGT 3357 

I I I I I I I I I I I I I I I III II I I I I I I III 

Db 1565 AATAAATTATTTGTGATAATAATATAATGTTAAAAATAAAT GTAATT ACTGT GAAATAAA 1624 

Qy 3358 GT GT GT GCATGT GTGTATGT GTAT CACAGGTAATAAAGGCAATTGGATGATTAAAAAAAA 3417 

I I I I I I I I II II III II I I I I I I I I I I 

Db 1625 C GAT AT GGATTT T AT T T CAAACTT GT CAAAT AT AAAAAAAAAAAAAAAAAAAAAAAAAAA 1684 

Qy 3418 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I II I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I II I I I I I I I I 
Db 1685 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1735 



RESULT 12 

US-09-182-816-24/c 

; Sequence 24, Application US/09182816 

; Patent No. 6143542 

; GENERAL INFORMATION: 

; APPLICANT: Wisnewski, Nancy 

; APPLICANT: Silver, Gary M. 

; APPLICANT: Lo, Katherine C. 

; APPLICANT: Brandt, Kevin S. 

; TITLE OF INVENTION: NOVEL FLEA EPOXIDE HYDROLASE NUCLEIC ACID MOLECULES, 
; TITLE OF INVENTION: PROTEINS AND USES THEREOF 
; FILE REFERENCE: FC-3-C1 

; CURRENT APPLICATION NUMBER: US/09/182,816 

; CURRENT FILING DATE: 1998-10-29 

; EARLIER APPLICATION NUMBER: 08/989,510 

; EARLIER FILING DATE: 1997-12-12 

; NUMBER OF SEQ ID NOS : 31 

; SOFTWARE: Patentln Ver. 2.0 

; SEQ ID NO 24 

LENGTH: 1736 

TYPE: DNA 
; ORGANISM: Ctenocephalides felis 
US-09-182-816-24 

Query Match 2.0%; Score 68.6; DB 3; Length 1736; 

Best Local Similarity 62.6%; Pred. No. 6.8e-07; 

Matches 107; Conservative 0; Mismatches 64; Indels 0; Gaps 0; 



Qy 3298 AAAAAATAATTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGT 3357 

I I I I I I I I I I III || I I | | | | Ml 

Db 172 AATAAATTATTT GTGATAATAATATAATGTTAAAAATAAATGTAATTACT GT GAAATAAA 113 

Qy 3358 GT GT GT GCAT GT GT GTAT GT GTAT C AC AGGTAATAAAGGCAATT GGAT GAT T AAAAAAAA 3417 



I 1 1 1 1 I I I II II III II I I 1 1 1 1 1 1 1 1 

Db 112 CGATAT GGAT T TT AT TT CAAACTT GT CAAAT ATAAAAAAAAAAAAAAAAAAAAAAAAAAA 53 

Qy 3418 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA^ 3468 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 52 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2 



RESULT 13 
US-09-471-528-22 

; Sequence 22, Application US/09471528 
; Patent No. 6153397 
; GENERAL INFORMATION: 

APPLICANT: Wisnewski, Nancy 
; APPLICANT: Silver, Gary M. 
; APPLICANT: Lo, Katherine C. 
; APPLICANT: Brandt, Kevin S. 

; TITLE OF INVENTION: FLEA EPOXIDE HYDROLASE PROTEINS AND USES THEREOF 
; FILE REFERENCE: FC-3-C1-1 

; CURRENT APPLICATION NUMBER: US/09/471,528 

; CURRENT FILING DATE: 1999-12-27 

; EARLIER APPLICATION NUMBER: 09/182,816 

; EARLIER FILING DATE: 1998-10-29 

; EARLIER APPLICATION NUMBER: 08/989,510 

; EARLIER FILING DATE: 1997-12-12 

; NUMBER OF SEQ ID NOS : 35 

SOFTWARE: PatentlnVer. 2.0 
; SEQ ID NO 22 

LENGTH: 1736 

TYPE: DNA 
; ORGANISM: Ctenocephalides felis 

FEATURE: 

NAME/ KEY: CDS 

LOCATION: ( 159 )..( 1553 ) 
US-09-471-528-22 

Query Match 2.0%; Score 68.6; DB 3; Length 1736; 

Best Local Similarity 62.6%; Pred. No. 6.8e-07; 

Matches 107; Conservative 0; Mismatches 64; Indels 0; Gaps 0; 



Qy 3298 AAAAAATAAT T T GTCAAT TAAT AGTT GT GT GCCAAGC ACT CCTAAT T TGT T TT ATT GC GT 3357 

I I I I I I I I I I I I I I I III II I I I I I I III 

Db 1565 AATAAATTATTT GTGATAATAATATAAT GTTAAAAATAAATGTAATT ACTGT GAAATAAA 1624 

Qy 3358 GT GT GT GCAT GT GT GT AT GT GT AT CACAGGTAAT AAAG GCAATT GGATGATT AAAAAAAA 3417 

I I I I I I I I II II III II I I I I I I I I I I 

Db 1625 C GAT AT GGATTT T AT T T CAAACTT GT CAAAT AT AAAAAAAAAAAAAAAAAAAAAAAAAAA 1684 

Qy 3418 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1685 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1735 



RESULT 14 

US-09-471-528-24/c 

; Sequence 24, Application US/09471528 
; Patent No. 6153397 



; GENERAL INFORMATION: 
; APPLICANT: Wisnewski, Nancy 
; APPLICANT: Silver, Gary M. 
; APPLICANT: Lo, Katherine C. 
; APPLICANT: Brandt, Kevin S. 

; TITLE OF INVENTION: FLEA EPOXIDE HYDROLASE PROTEINS AND USES THEREOF 
; FILE REFERENCE: FC-3-C1-1 

; CURRENT APPLICATION NUMBER: US/09/471,528 

; CURRENT FILING DATE: 1999-12-27 

; EARLIER APPLICATION NUMBER: 09/182,816 

; EARLIER FILING DATE: 1998-10-29 

; EARLIER APPLICATION NUMBER: 08/989,510 

; EARLIER FILING DATE: 1997-12-12 

; NUMBER OF SEQ ID NOS : 35 

; SOFTWARE: Patentln Ver. 2.0 

; SEQ ID NO 24 

LENGTH: 1736 

TYPE: DNA 

ORGANISM: Ctenocephalides felis 
US-09-471-528-24 

Query Match 2.0%; Score 68.6; DB 3; Length 1736; 

Best Local Similarity 62.6%; Pred. No. 6.8e-07; 

Matches 107; Conservative 0; Mismatches 64; Indels 0; Gaps 0; 

Qy 3298 AAAAAATAATTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGT 3357 

I I I I I I I I I I I I I I I III II I I I I I I III 

Db 172 AATAAAT T ATTT GT GATAATAAT AT AAT GTTAAAAAT AAAT GTAATT ACT GT GAAAT AAA 113 

Qy 3358 GTGTGTGCATGTGTGTATGT GTAT CACAGGTAATAAAGGCAATTGGAT GATTAAAAAAAA 3417 

I I I I I I I I II II III II I I I I I I I I I I 

Db 112 C GAT AT GGATT TT AT T T CAAACTT GT CAAAT AT AAAAAAAAAAAAAAAAAAAAAAAAAAA 53 

Qy 3418 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 52 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 2 



RESULT 15 
US-09-634-530-22 

Sequence 22, Application US/09634530 
Patent No. 6290958 
GENERAL INFORMATION: 
APPLICANT: Wisnewski, Nancy 
APPLICANT: Silver, Gary M. 
APPLICANT: Lo, Katherine C. 
APPLICANT: Brandt, Kevin S. 

TITLE OF INVENTION: FLEA EPOXIDE HYDROLASE PROTEINS AND USES THEREOF 
FILE REFERENCE: FC-3-C1-1 

CURRENT APPLICATION NUMBER: US/09/634,530 
CURRENT FILING DATE: 2000-08-08 
PRIOR APPLICATION NUMBER: 09/471,528 
PRIOR FILING DATE: 1999-12-27 
PRIOR APPLICATION NUMBER: 09/182,816 
PRIOR FILING DATE: 1998-10-29 
PRIOR APPLICATION NUMBER: 08/989,510 
PRIOR FILING DATE: 1997-12-12 



; NUMBER OF SEQ ID NOS : 35 
; SOFTWARE: Patentln Ver. 2.0 
; SEQ ID NO 22 
; LENGTH: 1736 

TYPE: DNA 
; ORGANISM: Ctenocephalides felis 

FEATURE: 

NAME/KEY: CDS 

LOCATION: ( 159 )..( 1553) 
US-09-634-530-22 

Query Match 2.0%; Score 68.6; DB 3; Length 1736; 

Best Local Similarity 62.6%; Pred. No. 6.8e-07; 

Matches 107; Conservative 0; Mismatches 64; Indels 0; Gaps 0; 

Qy 3298 AAAAAATAAT TT GT CAATTAAT AGT T GT GT GC CAAGCACT CCTAATTT GT T TT ATT GCGT 3357 

I I I I I I II I I I I I I I III II I I I I I I III 

Db 1565 AATAAATT AT TT GT GAT AATAAT AT AAT GTTAAAAAT AAAT GTAATT ACT GT GAAATAAA 1624 

Qy 3358 GT GT GT GCAT GT GT GT AT GTGT AT C AC AGGTAATAAAGGCAATT GGAT GAT TAAAAAAAA 3417 

I I I I I I I I I I II II I II I I I I I I I I I I 

Db 1625 C GAT AT GGAT TT TAT T T CAAACTT GT CAAAT ATAAAAAAAAAAAAAAAAAAAAAAAAAAA 1684 

Qy 3418 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1685 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1735 



Search completed: January 29, 2004, 02:33:11 
Job time : 195 sees 

GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 



Run on: 



January 28, 2004, 20:53:15 ; Search time 1084 Seconds 

(without alignments) 
11442.066 Million cell updates/sec 



Title: US-10-056-884A-1 
Perfect score: 3468 

Sequence: 1 caagcactgtgctaaagtgt aaaaaaaaaaaaaaaaaaaa 3468 



Scoring table: 



IDENTITY_NUC 

Gapop 10.0 , Gapext 1.0 



Searched: 



2356869 seqs, 1788235258 residues 



Total number of hits satisfying chosen parameters: 



4713738 



Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 



Post-processing: Minimum Match 0% 

Maximum Match 100% 



Listing first 45 summaries 



Database 



Published_Applications_NA: * 

1: /cgn2_6/ptodata/2/pubpna/US07_PUBCOMB.seq:* 

2: /cgn2_6/ptodata/2/pubpna/PCT_NEW_PUB.seq:* 

3: /cgn2_6/ptodata/2/pubpna/US06_NEW_PUB.seq:* 

4 : /cgn2_6/ptodata/2/pubpna/US06_PUBCOMB. seq: * 

5 : /cgn2_6/ptodata/2/pubpna/US07_NEW_PUB . seq : * 

6: /cgn2_6/ptodata/2/pubpna/PCTUS_PUBCOMB.seq: 

7: /cgn2_6/ptodata/2/pubpna/US08_NEW_PUB. seq:* 

8: /cgn2_6/ptodata/2/pubpna/US08_PUBCOMB. seq:* 

9: /cgn2_6/ptodata/2/pubpna/US09A_PUBCOMB. seq: 
10: /cgn2_6/ptodata/2/pubpna/US09B_PUBCOMB.seq 
11: /cgn2_6/ptodata/2/pubpna/US09C_PUBCOMB. seq 
12: /cgn2_6/ptodata/2/pubpna/US09_NEW_PUB.seq: 
13 : /cgn2_6/ptodata/2/pubpna/US09_NEW_PUB. seq2 
14 : /cgn2_6/ptodata/2/pubpna/US10A_PUBCOMB . seq 
15: /cgn2_6/ptodata/2/pubpna/US10B_PUBCOMB.seq 
16: /cgn2_6/ptodata/2/pubpna/US10_NEW_PUB. seq: 
17 : /cgn2_6/ptodata/2/pubpna/US60_NEW_PUB. seq: 
18 : /cgn2_6/ptodata/2/pubpna/US60_PUBCOMB. seq: 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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RESULT 1 
US-10-056-884-1 

; Sequence 1, Application US/10056884 
; Publication No. US20030032786A1 
; GENERAL INFORMATION: 

; APPLICANT: Bristol-Myers Squibb Company 

; TITLE OF INVENTION: POLYNUCLEOTIDE ENCODING A NOVEL HUMAN POTASSIUM CHANNEL 
BETA-SUBUNIT, 

; TITLE OF INVENTION: K+betaM2 
; FILE REFERENCE: D0076 NP 

; CURRENT APPLICATION NUMBER: US/10/056,884 

; CURRENT FILING DATE: 2002-01-24 

; PRIOR APPLICATION NUMBER: US 60/263,872 

; PRIOR FILING DATE: 2001-01-24 

; PRIOR APPLICATION NUMBER: US 60/269,794 

; PRIOR FILING DATE: 2001-02-14 

; NUMBER OF SEQ ID NOS : 73 

; SOFTWARE: Patentln version 3.0 

; SEQ ID NO 1 

LENGTH: 3468 

TYPE: DNA 
; ORGANISM: Homo sapiens 
; FEATURE : 
; NAME/ KEY: CDS 

LOCATION: ( 515 )..( 1798 ) 
US-10-056-884-1 

Query Match 100.0%; Score 3468; DB 15; Length 3468; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 3468; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 CAAGC ACT GT GCTAAAGT GTT T TT CAT AT GT CATGAAAAGT T GT GCC AGAAAAT T AT GGT 60 

I I I I I I II I I I I I I I I I I I I I I I I I I I! I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 



Db 1 CAAGCACT GT GCTAAAGT GT TT TT C AT AT GT CAT GAAAAGT T GT GCC AGAAAATT AT GGT 60 

Qy 61 TT GAACATGGGCAGTTTTCT CCTACCGT CAGCTAT ATCCACAAGCATCACAT GAAGT GGA 120 

I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 61 T T GAACAT GGGC AGTTT TCT C CT AC C GT CAGCTAT AT C C ACAAGC AT CACAT GAAGT GGA 120 

Qy 121 GAT CT GGC AGCT CT GT GT AT T T C AGT CAAGT T C CACAAT GAAAC CT GACAATAAT GGT AA 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 121 GAT CT GGCAGCT CT GTGT ATT T CAGT CAAGT T C CACAAT GAAAC CT GACAATAAT GGTAA 180 

Qy 181 AAACCAAT ACGGACATCTGAGTAACT GGGGAATTGGCCT GCCTT GCAT GT GAGCTTGAT G 240 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 181 AAACCAAT ACGGACATCTGAGTAACT GGGGAATT GGCCT GCCTTGCAT GTGAGCTTGATG 240 

Qy 241 GAAGATT GGATAT AGAC GAGT T GAT T AT ATTTT AT GAAGT AGC AGCT C ACT AC C ATC C AC 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 

Db 241 GAAGATT GGATATAGAC GAGT T GAT TAT ATTTT AT GAAGT AGCAGCTCACT AC CATCCAC 300 

Qy 301 CATCCAGGGTTTAAACTACTTTTTCAGCATCACTTCACCTGTGGACTCTTATACATTTTG 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 301 CATCCAGGGTTTAAACTACTTTTTCAGCATCACTTCACCTGTGGACTCTTATACATTTTG 360 

Qy 361 ATTTCTT GGGGGAAAAATACT GGGATAAGAGGAGGT CATTTTTTAATAAGTTAGCAT CCT 42 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 

Db 361 ATT TCTT GGGGGAAAAATACT GGGATAAGAGGAGGT CAT T TTT TAATAAGTT AGC AT C CT 420 

Qy 421 TTT CCCTTTCTTACAAGTT GATCCAAAGGATAAGGCT GT GACT CCATT GGATT GCACCTT 4 80 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 421 TTT CCCTTT CTTACAAGTT GATCCAAAGGATAAGGCTGT GACT CCATT GGATT GCACCTT 4 80 

Qy 481 TAAATCAAAATAGCAGCAGCAGAAGAAAGGGACAATGGCT CTGAGT GGAAACT GT AGT CG 54 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 481 T AAAT CAAAATAGC AGC AGCAGAAGAAAGGGACAAT GGCT CT GAGT GGAAACT GT AGT CG 540 

Qy 541 T T ATT AT C CT CGAGAACAAGGGT C CGC AGTT C C CAACT CCTTCCCT GAGGT GGT AGAGCT 600 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 541 TTATTATCCTCGAGAACAAGGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGT AGAGCT 600 

Qy 601 GAAT GT C GGGGGT CAAGTT TAT TT TACT CGCC AT TC CACAT T GATAAGCAT C C CT CATT C 660 

I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I 

Db 601 GAAT GT C GGGGGT CAAGT T T AT TT TACT CGC CAT T C CACAT T GAT AAGCAT CC CT CATT C 660 

Qy 661 CCT CCT GT GGAAAAT GT TTT C C CCAAAGAGAGAC AC GGCTAAT GAT CT AGCCAAGGACT C 720 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 661 CCT CCTGT GGAAAAT GTTTT CCCCAAAGAGAGACACGGCTAAT GATCTAGCCAAGGACTC 720 

Qy 721 CAAGGGAAGGTT T T T C ATT GAC AGAGAT GGAT T CTT GT T CC GTT AT ATTCT GGACT AT CT 780 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 721 CAAGGGAAGGTTTTT CATT GACAGAGAT GGATTCTT GTT CCGTT AT ATTCT GGACT AT CT 780 

Qy 781 CAGGGACAGGCAGGTGGTCCT GCCTGAT CACTTTCCAGAAAAAGGAAGACT GAAAAGGGA 840 

I I I I I I I I I I I I I I I I I II II I I I I I I I I I II I I I I I I I I II I I I I I I I II I I I I I I I I I 

Db 781 CAGGGACAGGCAGGT GGTCCTGCCT GAT CACTTTCCAGAAAAAGGAAGACT GAAAAGGGA 840 

Qy 841 AGCTGAATACTT CCAGCTCCCAGACTT GGTCAAACT CCT GACCCCCGATGAAATCAAGCA 900 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 841 AGCT GAAT ACTT C CAGCTC C C AGACT T GGT CAAACT C CT GACC C C C GAT GAAATCAAGCA 900 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



901 AAGC C CAGAT GAATT CT GC CAC AGT GACTT T GAAGAT GC CT CCCAAGGAAGC GAC ACAAG 960 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
901 AAGC C CAGAT GAATT CT GCCAC AGT GACT T T GAAGAT GC CT C CCAAGGAAGC GAC ACAAG 960 

961 AATCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGG 1020 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
961 AATCTGCCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGG 1020 

1021 T T AC AGAGGAT C CT G C ACCT T GGGCAGAGAGG GAC AGGCAGAT GCCAAGT TT CGGAGAGT 1080 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1021 TT AC AGAGGAT C CT G CAC CT T GGGCAGAGAGG GAC AGGCAGATGCCAAGTT T CGGAGAGT 1080 

1081 TCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTT 1140 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1081 TCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTT 1140 

1141 GAATGAAAGCAGAGACCCTGAT CGAGCCCCAGAAAGATACACCTCCAGATTTTATCT CAA 1200 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1141 GAATGAAAGCAGAGACCCT GAT CGAGCCC CAGAAAGATACACCTCCAGATTTTATCTCAA 1200 

1201 AT T CAAGCAC CT GGAAAGGGCT TTT GAT AT GTTGT CAGAGT GTGGAT TC C AC AT GGT GGC 1260 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1201 ATT CAAGCACCTGGAAAGGGCTTTT GAT AT GTTGT CAGAGTGTGGATTCCACATGGTGGC 1260 

1261 CT GTAACTCAT CGGT GAC AGC AT CT T T CAT CAAC CAAT AT ACAGAT GACAAGAT CT GGT C 1320 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1261 CT GTAACT C AT C GGT GACAGC AT CT T T CAT CAACCAAT AT ACAGAT GACAAGAT CT GGT C 1320 

1321 AAGCT ACACT GAAT AT GTCT T CT AC C GT GAGC CTT C CAGAT GGT CACCCT C ACACT GCGA 1380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1321 AAGCT AC ACT GAAT AT GT CTT CT AC C GTGAGC CT T CC AGAT GGT C AC CCT C ACACT GCGA 1380 

1381 TTGCTGCTGCAAGAATGGCAAAGGTGACAAAGAAGGGGAGAGCGGCACGTCTTGCAATGA 1440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1381 TTGCTGCTGCAAGAATGGCAAAGGTGACAAAGAAGGGGAGAGCGGCACGTCTTGCAATGA 144 0 

1441 CCT CT CC ACAT CT AG CT GC GAC AGCC AGT CT GAGGC CAGCT CTC C CC AGGAGAC GGT CAT 1500 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1441 CCT CT C CAC AT CTAGCT GCGAC AG C C AGT CT GAGGCCAGCT CTC C C C AGGAGAC GGT CAT 1500 

1501 CT GT GGT CCCGT GACACGCCAGACCAACATCCAGACT CT GGACCGT CCCAT CAAGAAGGG 1560 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1501 CT GT GGT C C C GT GAC ACGC C AGAC CAAC AT C CAGACT CT GGAC CGT C CCAT CAAGAAGGG 1560 

1561 CCCT GT C CAGCT GAT CCAAC AGT CAGAGAT GC GGCGGAAAAGCGACT T ACT CCGGATT CT 1620 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1561 CCCT GT CC AGCT GAT CCAAC AGT CAGAGAT GC GGCGGAAAAGCGACT TACT C CGGATT CT 1620 

1621 GACTT CAGGCTCCAGGGAAT CGAACATGAGCAGCAAAAAAAAAGCT GTTAAAGAAAAGCT 1680 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1621 GACTT CAGGCT C CAGGGAAT CGAAC AT GAGCAGCAAAAAAAAAGCT GTTAAAGAAAAGCT 168 0 

1681 CT CAATTGAGGAGGAGCTGGAGAAAT GTATCCAGGATTTCCTAAAAAAAAAAATT CCAGA 174 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1681 CT CAATTGAGGAGGAGCT GGAGAAATGTAT CCAGGATTT CCTAAAAAAAAAAATTCCAGA 174 0 



Qy 1741 T C GGTTT C CT GAGAGAAAACAT CCT T GGCAAT CT GAACT T T TAAGGAAGT AT CAT CT AT A 1800 

I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1741 T C GGT T T CCT GAGAGAAAACAT CCT T GGCAAT CT GAACTTT T AAG GAAGT AT CAT CT AT A 1800 

Qy 1801 AGGGAGGGCTGGGGGCGGGGAAAAAAAAAAAAAAGAGTCATTTTGAAATTAACCTCATAA 1860 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I 
Db 1801 AGGGAGGGCTGGGGGCGGGGAAAAAAAAAAAAAAGAGTCATTTTGAAATTAACCTCATAA 1860 

Qy 1861 AAGGAAT T C AT ATTT T AAAGGAAAAAAAT ACAACTAATGAT GCAC AT TT CT T AGAAC ACA 1920 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1861 AAGGAATTCATATTTTAAAGGAAAAAAAT ACAACTAATGAT GCACATTT CTTAGAACACA 1920 

Qy 1921 AT AGT C CAT T GAT AT AC TACT GCCT ACTTTAC CT AGT T C AC CTT AAC AT GTAAAT C C AC A 1980 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 1921 AT AGT C CAT T GAT AT AC TACT GCCT ACTT TAC CT AGT TC AC CTTAACAT GTAAAT C CAC A 1980 

Qy 1981 GGGTAGATTTCTTTCTAGATGTGGAAGTACAAGAAAATCTTTTTTAGTTATTTGTTTGTT 2040 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1981 GGGT AGATT T CT TT CT AGAT GT GGAAGTACAAGAAAAT CT T T TTT AGTT ATT T GT T T GT T 204 0 

Qy 2041 TACT T C GTC C CAT GT GCTAACT AT CTT AT AT AT AAT GAGAGC C AGCT AC GTAAAAGT AGC 2100 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2041 T ACT T C GTC C C ATGT GCTAACT AT CTT AT AT AT AAT GAGAGC CAGCT AC GTAAAAGT AGC 2100 

Qy 2101 TGAGAGGCCTTGGGAGTCATTTATCCCAAACTGGGTTTTTTCTCTCATCCTTCTACCTCC 2160 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2101 TGAGAGGCCTTGGGAGTCATTTATCCCAAACTGGGTTTTTTCTCTCATCCTTCTACCTCC 2160 

Qy 2161 CTCCTTTGAATGAGGGTATGGTAGAAAAAGATCTGGCCCAATGGCATAAGTTTGGAATTT 2220 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I 
Db 2161 CT CCT T T GAAT GAGG GT AT GGT AGAAAAAGAT CT GGC CCAAT GGCAT AAGT TTGGAATT T 2220 

Qy 2221 TTAATTTTGGTTTTTCCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTT 2280 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2221 TTAATTTTGGTTTTTCCTTTTGTTTATGGGGTTGGGGGGAATGGCAGATTTATATGACTT 2280 

Qy 2281 T T CACT CAAAT CTAT AT GT GCC AGT T TAT AT T GACT C CGT AT GCAT GAGT ATTT GT GCAA 2340 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2281 TT CACT CAAAT CTAT AT GT GCCAGT T TAT AT T GACT C CGT AT GCAT GAGT ATTT GT GCAA 2340 

Qy 2341 CACAAGCACAACTAAGTAT GTATAT ACACAT GACGCACACGATGCCAGGGCCT AGACCTC 2400 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2341 CACAAGCACAACTAAGTAT GTATAT ACACAT GACGCACACGATGCCAGGGCCT AGACCT C 2400 

Qy 24 01 C CAAGGGCT GT GCT C CT GCT C CCAGC AGC CCT CT CT T AGAAT ATT T CAGAT G GAT GAGCT 2460 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 2401 C CAAGGGCT GT GCT C CT GCT C C CAGC AGC C CT CT CT T AGAAT AT T T CAGAT GGAT GAGCT 2460 

Qy 2461 TCTGACTCTTTCTTAAAATTCTTTTGGGAAGATTTCCCAGCCTTTCTTCACAACACTTTC 2520 

I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2461 TCTGACTCTTTCTTAAAATTCTTTTGGGAAGATTTCCCAGCCTTTCTTCACAACACTTTC 2520 

Qy 2521 TAACAT CAAAT GACTCT CAT CATCAACAAATTGTATT CCTTATT GT GAAATTAATACCCT 2580 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2521 TAACAT CAAAT GACT CT CAT CATCAACAAATT GT ATT CCTTATT GT GAAATTAATACCCT 2580 

Qy 2581 CAGGCT C CAT T TT ACT GCT TT GCT CT TT GT CT GC AT TAAGAGAGGAT GAGGAGAGCT GGT 2640 



Db 2581 CAGGCT C C AT TTT ACT GCT T T GCT CT T TGT CT GC AT T AAGAGAGGATGAGGAGAGCT GGT 2640 

Qy 2641 CAAAC AT T CCTT GT GTTAAAAAAAT CAAACAT T CAT AT C C ACAAAATT TT CT GCTAAAT G 2700 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2641 CAAACAT T CCTT GT GTTAAAAAAAT CAAACAT T CAT AT C C ACAAAATT TT CT GCTAAAT G 27 00 

Qy 2701 ACTCCACACTCAGCCTTCTCTACCCTGT^ACTGAATTATCACCCTTTTCTCCATGTTTTCA 2760 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 

Db 2701 ACTCCACACTCAGCCTTCTCTACCCTGAACTGAATTATCACCCTTTTCTCCATGTTTTCA 2760 

Qy 2761 GAGTT CT T ACT GC C C AC AGTT T AAT GGTGT GGC CTTTC C ACAT AAT C C AC AT TAAGTT CT 2820 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2761 GAGTT CT TACT G C C C AC AGTT T AAT GGT GT GGC CTT T CCACAT AAT CCAC AT TAAGTT CT 2820 

Qy 2821 GT GTT CCT GT GTT GT T GT GGAACTAAGGACAAC ACACAGT ACT T GAATAAGGGT C C GGCC 2 8 80 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 

Db 2821 GT GTT C CT GT GT T GT T GT GGAACTAAGGACAAC ACACAGT ACT T GAATAAGGGT C C GGC C 2880 

Qy 2881 T TTTGTTT GT TT T AGAGAAAGTT GT AT T C C ACACACAAC CTAAT AATT T CT T ATAAAAAT 2 94 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 

Db 2881 TTTTGTTTGTTT T AGAGAAAGTT GT AT T CC AC AC AC AAC CTAAT AAT T T CT T AT AAAAAT 2940 

Qy 2941 TTTAAACTACAAAGCTACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATT 3000 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 2941 TTTAAACTACAAAGCTACATTTTTACTTGCTTGTAGCCGTTTTTGTTTGCCTTTGGGATT 3000 

Qy 3001 CGGGCTTTGGCTGTGCCCATGCTAGGATTTAGCTGTGTCATTTTTATGATGTCTGTAACA 3060 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 3001 CGGGCTTTGGCTGTGCCCATGCTAGGATTTAGCTGTGTCATTTTTATGATGTCTGTAACA 3060 

Qy 3061 AC C CAACAAGGT AACT GAAGCT C C AGAGT TAAGGTT T C AGATTT CTAAAT GAAACT AT CT 3120 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 

Db 3061 ACCCAACAAGGTAACTGAAGCTCCAGAGTTAAGGTTT CAGATTTCTAAAT GAAACTAT CT 3120 

Qy 3121 TTTTCAATTACATCCTGACTTGTATAGACACAGCCAAAAAGAAACTGTTAATAGCCATCC 3180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 3121 T TTTCAAT T ACAT C CT GACT T GTAT AGAC ACAGC CAAAAAGAAACT GTT AAT AGC CAT CC 3180 

Qy 3181 GTCCAT GTAACTCT GTATTTT ACTAAGGTACCAATAGCT CTTT CAT AGACTT GTGCTACA 3240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 3181 GT C CAT GTAACT CT GTAT TTT ACTAAGGT AC CAAT AGCT CTTT CAT AGACTT GT GCT ACA 3240 

Qy 3241 AGAAGGTTAAAAGAC CAGT TTT AT TTT CAGC AT T C CT CAT G CAT T T CAGT GGTAACCAAA 3300 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 3241 AGAAGGTTAAAAGAC CAGT TTT AT TTT CAGC AT T CCT CAT GC AT TT CAGT GGT AAC CAAA 3300 

Qy 3301 AAATAATTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTG 3360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 3301 AAAT AAT T T GT CAAT T AAT AGT T GT GT GCCAAGCACT C CTAAT T T GTT TT ATT GC GT GTG 3360 

Qy 3361 T GT GCAT GT GT GTAT GT GTAT CACAGGT AAT AAAGGC AAT T GGAT GAT T AAAAAAAAAAA 3420 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I II 

Db 3361 T GT GCAT GT GT GTAT GT GTAT C ACAGGTAATAAAGGCAAT T GGAT GAT T AAAAAAAAAAA 3420 



Qy 



3421 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



Db 3421 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 



RESULT 2 
US-10-056-884-3 

; Sequence 3, Application US/10056884 
; Publication No. US20030032786A1 
; GENERAL INFORMATION: 

; APPLICANT: Bristol-Myers Squibb Company 

; TITLE OF INVENTION: POLYNUCLEOTIDE ENCODING A NOVEL HUMAN POTASSIUM CHANNEL 
BETA-SUBUNIT, 

; TITLE OF INVENTION: K+betaM2 
; FILE REFERENCE: D0076 NP 

; CURRENT APPLICATION NUMBER: US/10/056, 884 

; CURRENT FILING DATE: 2002-01-24 

; PRIOR APPLICATION NUMBER: US 60/263,872 

; PRIOR FILING DATE: 2001-01-24 

; PRIOR APPLICATION NUMBER: US 60/269,794 

; PRIOR FILING DATE: 2001-02-14 

; NUMBER OF SEQ ID NOS : 73 

SOFTWARE: Patentln version 3.0 
; SEQ ID NO 3 

LENGTH: 769 

TYPE: DNA 
; ORGANISM: Homo sapiens 
US-10-056-884-3 

Query Match 22.2%; Score 769; DB 15; Length 769; 

Best Local Similarity 100.0%; Pred. No. 9.5e-172; 

Matches 769; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 393 AGGT CATTT T TTAATAAGT T AGC AT C CT TT TC C CT TTCT T ACAAGTT GAT C CAAAGGAT A 4 52 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 AGGT CATTT T TTAATAAGT T AGC AT C CT TT T C C CTT TCT T ACAAGTT GAT C CAAAGGAT A 60 

Qy 453 AGGCTGT GACT C CAT T GGATT GCAC CTT TAAAT CAAAAT AGCAGCAGCAGAAGAAAGGGA 512 

I II I I I I I I II I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 61 AGGCT GT GACT CCATT GGATT GCAC CTTTAAAT CAAAAT AGCAGCAGCAGAAGAAAGGGA 120 

Qy 513 CAAT GGCT CT GAGT GGAAACT GT AGT CGT T ATT AT C CT C GAGAACAAGGGT CCGC AGTT C 572 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 CAATGGCTCTGAGTGGAAACTGTAGTCGTTATTATCCTCGAGAACAAGGGTCCGCAGTTC 180 

Qy 573 CCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTATTTTACTCGCC 632 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 CCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTATTTTACTCGCC 240 

Qy 633 AT T C C ACAT T GATAAGC AT CC CT CAT TCCCTCCTGT GGAAAATGT TT T CC C CAAAGAGAG 692 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 AT T CCAC AT T GAT AAGCAT CC CT C AT T C C CT CCT GT GGAAAATGT TT T CC C CAAAGAGAG 300 

Qy 693 ACACGGCTAATGATCTAGCCAAGGACTCCAAGGGAAGGTTTTTCATTGACAGAGATGGAT 752 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I 
Db 301 AC ACGGCTAATGAT CT AGC CAAGGACT C CAAGGGAAGGTT TT TC ATT GAC AGAGAT G GAT 360 

Qy 753 TCTTGTTCCGTTATATTCTGGACTATCTCAGGGACAGGCAGGTGGTCCTGCCTGATCACT 812 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 



Db 361 TCTTGTTCCGTTATATTCTGGACTATCTCAGGGACAGGCAGGTGGTCCTGCCTGATCACT 420 

Qy 813 TTCCAGAAAAAGGAAGACTGAAAAGGGAAGCTGAATACTTCCAGCTCCCAGACTTGGTCA 872 

I I I I I I I I I I I I t I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 TT CC AGAAAAAGGAAGACT GAAAAGGGAAGCT GAAT ACT TC C AG CT C CCAGACTTGGTCA 480 

Qy 873 AACT C CT GAC C C CC GAT GAAAT CAAGCAAAGC CCAGATGAAT T CT GC C ACAGT GACT TT G 932 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 481 AACT C CT GAC C C CC GAT GAAAT CAAGCAAAGC CCAGATGAATT CTGCC ACAGT GACT TT G 540 

Qy 933 AAGATGCCTCCCAAGGAAGCGACACAAGAATCTGCCCCCCTTCCTCCCTGCTCCCTGCCG 992 

I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I 
Db 541 AAGATGCCTCCCAAGGAAGCGACACAAGAATCTGCCCCCCTTCCTCCCTGCTCCCTGCCG 600 

Qy 993 ACCGCAAGTGGGGTTTCATTACTGTGGGTTACAGAGGATCCTGCACCTTGGGCAGAGAGG 1052 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 601 ACCGCAAGTGGGGTTTCATTACTGTGGGTTACAGAGGATCCTGCACCTTGGGCAGAGAGG 660 

Qy 1053 GACAGGCAGATGCCAAGTTTCGGAGAGTTCCCCGGATTTTGGTTTGTGGAAGGATTTCCT 1112 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 661 GACAGGC AGAT GCCAAGTT T C GGAGAGTT CC C CGGATTT T GGTT T GT GGAAGGATT T C CT 720 

Qy 1113 T GGCAAAAGAAGTCT T T GGAGAAACT TT GAAT GAAAGCAGAGACC CT GA 1161 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 721 T GGCAAAAGAAGTCT T T GGAGAAACT TT GAAT GAAAGCAGAGAC C CT GA 769 



RESULT 3 

US-10-029-386-10927/c 

Sequence 10927, Application US/10029386 
Publication No. US20030194704A1 
GENERAL INFORMATION: 
APPLICANT: Penn, Sharron G. 
APPLICANT: Rank, David R. 
APPLICANT: Hanzel, David K. 

TITLE OF INVENTION: HUMAN GENOME- DERIVED SINGLE EXON NUCLEIC ACID PROBES 
USEFUL FOR GENE 

TITLE OF INVENTION: EXPRESSION ANALYSIS TWO 
FILE REFERENCE: AEOMICA-X-2 

CURRENT APPLICATION NUMBER: US/10/029, 386 
CURRENT FILING DATE: 2001-12-20 
NUMBER OF SEQ ID NOS : 34288 

SOFTWARE: Annomax Sequence Listing Engine vers. 1.1 
SEQ ID NO 10927 
LENGTH: 541 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE: 

OTHER INFORMATION: MAP TO AC008716.6 

OTHER INFORMATION: EXPRESSED IN PLACENTA, SIGNAL = 0.44 
OTHER INFORMATION: EXPRESSED IN FETAL LIVER, SIGNAL =0.55 
OTHER INFORMATION: NT HIT: AB037738.1, EVALUE 0.00e+00 
OTHER INFORMATION: SWISSPROT HIT: 053257, EVALUE 3.90e+00 
OTHER INFORMATION: EST_HUMAN HIT: AI345820.1, EVALUE 1.90e-01 
US-10-029-386-10927 



Query Match 



8.0%; Score 276.4; DB 13; Length 541; 



Best Local Similarity 97.9%; Pred. No. 4.8e-55; 

Matches 280; Conservative 0; Mismatches 6; Indels 0; Gaps 0; 

Qy 1336 T GT CTT CT AC C GT GAGC CT T C C AGAT GGT C AC C CT CACACT GC GAT T GCT G CTGCAAGAA 1395 

I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I 
Db 286 TTTCTTTTCAGGTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAA 227 

Qy 1396 TGGCAAAGGT GACAAAGAAGGGGAGAGCGGCACGTCTT GCAAT GACCTCT CCACATCT AG 1455 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 226 T GGCAAAG GT GACAAAGAAGGG GAGAGCGGCAC GT CT T GCAAT GAC CT CT C C AC ATCT AG 167 

Qy 1456 CT GC GAC AGC C AGT CT GAGGCCAGCT CT C C C C AGGAGACGGT CAT CT GT GGT C C C GT GAC 1515 

I I I I I II I I I I I I I I II I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 166 CT GC GACAGC CAGT CT GAGGCCAGCT CT C C CCAGGAGAC G GT CAT CT GT GGT CC C GT GAC 107 

Qy 1516 ACGCCAGACCAACAT CCAGACTCT GGACCGTCCCAT CAAGAAGGGCCCT GTCCAGCT GAT 1575 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 106 ACGC CAGACCAACAT CCAGACTCT GGACCGTCCCAT CAAGAAGGGCCCT GTC CAGCT GAT 47 

Qy 1576 CCAAC AGT CAGAGAT GC GGCGGAAAAGC GACT TACT C C G GATT CTG 1621 

I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4 6 CCAAC AGT CAGAGAT GC GGCGGAAAAGC GACTT ACT C C GGACT CTG 1 



RESULT 4 

US-10-029-386-24630/C 

Sequence 24630, Application US/10029386 
Publication No. US20030194704A1 
GENERAL INFORMATION: 
APPLICANT: Penn, Sharron G. 
APPLICANT: Rank, David R. 
APPLICANT: Hanzel, David K. 

TITLE OF INVENTION: HUMAN GENOME- DERIVED SINGLE EXON NUCLEIC ACID PROBES 
USEFUL FOR GENE 

TITLE OF INVENTION: EXPRESSION ANALYSIS TWO 
FILE REFERENCE: AEOMICA-X-2 

CURRENT APPLICATION NUMBER: US/10/029, 386 
CURRENT FILING DATE: 2001-12-20 
NUMBER OF SEQ ID NOS : 34288 

SOFTWARE: Annomax Sequence Listing Engine vers. 1.1 
SEQ ID NO 24630 
LENGTH: 279 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE: 

OTHER INFORMATION: MAP TO AC008716.6 

OTHER INFORMATION: EXPRESSED IN PLACENTA, SIGNAL =0.44 
OTHER INFORMATION: EXPRESSED IN FETAL LIVER, SIGNAL = 0.55 
OTHER INFORMATION: SWISSPROT HIT: P19836, EVALUE 2.30e+00 
OTHER INFORMATION: NT HIT: AB037738.1, EVALUE 0.00e+00 
US-10-029-386-24630 

Query Match 6.9%; Score 240; DB 13; Length 279; 

Best Local Similarity 98.0%; Pred. No. 1.4e-46; 

Matches 243; Conservative 0; Mismatches 5; Indels 0; Gaps 0; 



Qy 1336 TGTCTTCTACCGTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAA 1395 



Db 


248 


TTTCTTTTCAGGTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAA 


189 


Qy 


1396 


T GGCAAAGGT GACAAAGAAGGGGAGAGCGGCACGTCTTGCAAT GACCTCT C CACATCTAG 


1455 






I l 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 l l l l l l I I I I I I l l ll l l l l 1 l I i i i i i I I I l l 1 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 H 1 1 1 1 1 1 1 1 1 j 1 1 1 M 1 1 1 1 M 1 1 1 M 1 1 I | | | | I M 1 1 II 




Db 


188 


TGGCAAAGGTGACAAAGAAGGGGAGAGCGGCACGTCTTGCAATGACCTCTCCACATCTAG 


129 


Qy 


1456 


CTGCGACAGCCAGTCTGAGGCCAGCTCTCCCCAGGAGACGGTCATCTGTGGTCCCGTGAC 


1515 






i i i i i i I i i i I I I I I I I i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 
1 1 1 II 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 II 1 1 II 1 1 I 1 1 I I M M 1 1 1 1 1 11 




Db 


128 


CT G CGACAGC CAGT CT GAGGCCAGCT CT CCC CAGGAGAC GGTCAT CT GTGGT C CCGT GAC 


69 


Qy 


1516 


ACGCC AGAC CAAC AT CCAGACT CT GGACC GT C C CAT CAAGAAGGGCC CT GT C C AGCT GAT 


1575 






1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 




Db 


68 


ACG CC AGAC CAAC AT CCAGACT CT GGACC GT C C CAT CAAGAAGGGCC CT GT C CAGCT GAT 


9 


Qy 


1576 


CCAACAGT 1583 





Db 8 CCAACAGT 1 



RESULT 5 

US-10-060-036-4467 

Sequence 4467, Application US/10060036 
Publication No. US20030073144A1 
GENERAL INFORMATION: 
APPLICANT: Benson, Darin R. 
APPLICANT: Kalos, Michael D. 
APPLICANT: Lodes, Michael J. 
APPLICANT: Persing, David H. 
APPLICANT: Hepler, William T. 
APPLICANT: Jiang, Yuqiu 

TITLE OF INVENTION: COMPOSITIONS AND METHODS FOR THE THERAPY 
TITLE OF INVENTION: AND DIAGNOSIS OF PANCREATIC CANCER 
FILE REFERENCE: 210121.566 

CURRENT APPLICATION NUMBER: US/10/060, 036 
CURRENT FILING DATE: 2002-01-30 
NUMBER OF SEQ ID NOS : 4560 

SOFTWARE: FastSEQ for Windows Version 4.0 
SEQ ID NO 4467 
LENGTH: 632 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE: 

NAME/KEY: misc_feature 
LOCATION: 552, 569 

OTHER INFORMATION: n = A,T,C or G 
US-10-060-036-4467 



Query Match 5.9%; Score 205; DB 15; Length 632; 

Best Local Similarity 93.4%; Pred. No. 4.4e-38; 

Matches 214; Conservative 0; Mismatches 15; Indels 0; Gaps 0; 

Qy 3206 AGGTAC CAAT AGCT CTTT C AT AGACT T GT GCT ACAAGAAGGT T AAAAGAC C AGTT T TAT T 3265 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 15 AGGTAC CAAT AG C T CT T T CAT AG AC T T GT G CT ACAAGAAG GT T AAAAGAC CAGT T T TAT T 74 

Qy 3266 TTCAGCATTCCT CAT GCATTTCAGT GGTAACCAAAAAATAATTTGTCAATTAATAGTT GT 3325 



Db 7 5 TTCAGCATT CCT CATGCATTTCAGTGGTAACCAAAAAATAATTTGT CAATTAATAGTT GT 134 

Qy 3326 GTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTATGTGTATCACA 3385 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 135 GTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGTGTATGTGTATCACA 194 

Qy 3386 GGTAATAAAGGCAATT GGAT GATTAAAAAAAAAAAAAAAAAAAAAAAAA 3434 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II 

Db 195 G GT AATAAAGGCAATT GGAT GAT AT CT GT AGGAGGAAAACAAT GACT AA 243 



RESULT 6 

US-10-060-036-564 

Sequence 564, Application US/10060036 
Publication No. US20030073144A1 
GENERAL INFORMATION: 
APPLICANT: Benson, Darin R. 

Kalos, Michael D. 
Lodes, Michael J. 
Persing, David H. 
Hepler, William T. 
Jiang, Yuqiu 

COMPOSITIONS AND METHODS FOR THE THERAPY 
AND DIAGNOSIS OF PANCREATIC CANCER 



APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
APPLICANT: 
TITLE OF INVENTION: 
TITLE OF INVENTION: 



US/10/060, 036 
-01-30 



FILE REFERENCE: 210121.566 
CURRENT APPLICATION NUMBER: 
CURRENT FILING DATE: 2002- 
NUMBER OF SEQ ID NOS : 4560 

SOFTWARE: Fast SEQ for Windows Version 4.0 
SEQ ID NO 564 
LENGTH: 614 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

NAME/ KEY: misc_feature 
LOCATION: 534, 551, 575, 576 
OTHER INFORMATION: n = A,T,C or G 
US-10-060-036-564 



Query Match 5.8%; 
Best Local Similarity 93.3%; 
Matches 210; Conservative 



Score 201; DB 15; 
Pred. No. 3.9e-37; 
0; Mismatches 15; 



Length 614; 



Indels 



0; Gaps 



0; 



Qy 3210 ACCAAT AGCTCT T T CAT AGACTTGT GCT ACAAGAAGGTT AAAAGAC CAGTT T T ATT TT C A 3269 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 AC CAATAGCTCTTT CAT AGACTTGT GCTACAAGAAGGTTAAAAGAC CAGTTTTATTTT CA 60 



Qy 

Db 

Qy 

Db 

Qy 



3270 GCATTCCTCATGCATTTCAGTGGTAACCAAAAAATAATTTGTCAATTAATAGTTGTGTGC 3329 
I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II 
61 GC AT T C CT CAT G C ATTT C AGT GGTAACCAAAAAAT AATTT GT CAAT TAATAGT T GT GT GC 120 

3330 CAAGCACT C CT AAT TT GTTT T ATT GCGT GT GT GT GCAT GT GT GT AT GT GTAT CACAGGT A 3389 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
121 CAAGCACT C CT AAT TT GTTT T ATT GCGT GT GT GT GCAT GT GT GTAT GT GTAT CACAGGT A 180 

3390 AT AAAGGCAAT T GGAT GATTAAAAAAAAAAAAAAAAAAAAAAAAA 3434 



181 ATAAAGGCAATTGGATGAT AT CTGTAGGAGGAAAACAAT GACTAA 225 



RESULT 7 
US-10-080-980-1 

Sequence 1, Application US/10080980 
Publication No. US20030036115A1 
GENERAL INFORMATION: 
APPLICANT: Bristol-Myers Squibb Company 

TITLE OF INVENTION: POLYNUCLEOTIDE ENCODING A NOVEL HUMAN POTASSIUM CHANNEL 
BETA-SUBUNIT, 

TITLE OF INVENTION: K+betaM6, EXPRESSED HIGHLY IN THE SMALL INTESTINE 
FILE REFERENCE: D0121 NP 

CURRENT APPLICATION NUMBER: US/10/080,980 
CURRENT FILING DATE: 2002-02-21 
PRIOR APPLICATION NUMBER: US 60/270,132 
PRIOR FILING DATE: 2001-02-21 
PRIOR APPLICATION NUMBER: US 60/278,953 
PRIOR FILING DATE: 2001-03-27 
NUMBER OF SEQ ID NOS : 74 
SOFTWARE: Patentln version 3.0 
SEQ ID NO 1 
LENGTH: 2052 
TYPE: DNA 

ORGANISM: homo sapiens 
FEATURE : 
NAME/ KEY: CDS 
LOCATION: (121) . . (1095) 
US-10-080-980-1 

Query Match 4.8%; Score 167; DB 15; Length 2052; 

Best Local Similarity 64.6%; Pred. No. 9.4e-29; 

Matches 267; Conservative 0; Mismatches 140; Indels 6; Gaps 1; 

Qy 967 CCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTTACAG 1026 

I I I I I I I III II III I I I I I I I I I I I I I I I I I 

Db 705 CACGCCGTCCCAGTCGCTGGACGGCAGCCGGCGCTCGGGCTACATCACCATCGGCTACCG 764 

Qy 1027 AGGAT C CT GC AC CTT GGGCAGAGAGGGACAGGC AGAT GC CAAGT TT C GGAGAGT T CC C CG 108 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III 

Db 765 CGGCTCCTACACCATCGGGCGGGACGCGCAGGCGGACGCCAAGTTCCGGCGAGTGGCGCG 824 

Qy 1087 GATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACTTTGAATGA 1146 

II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 825 CAT CAC C GTTT GCGGAAAGAC GT C GCT G GC CAAGGAGGTGT T T GGGGAC AC CCT GAAC GA 884 

Qy 1147 AAGCAGAGACCCTGATCGAGCCCCAGAAAGATACACCTCCAGATTTTATCTCAAATTCAA 1206 

I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I II I I I I I I I I 
Db 885 AAGCCGGGACCCCGACCGTCCCCCGGAGCGCTACACCTCGCGCTATTACCTCAAGTTCAA 944 

Qy 1207 GC AC CT GGAAAGGGCTT T T GAT AT GTT GT C AGAGT GT GGAT T C C ACAT GGT GGC CTGTAA 1266 

I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 945 CTTCCTGGAGCAGGCCTTCGACAAGCTGTCCGAGTCGGGCTTCCACATGGTGGCGTGCAG 1004 

Qy 1267 CTCATCGGTGACAGCATCTTT CAT CAAC CAAT AT AC AGAT GACAAGAT CT GGT C 1320 

III III II III I I I I I I II I I I I I I I I I I I I I I I 



Db 1005 CTCCACGGGCACCTGCGCCTTTGCCAGCAGCACCGACCAGAGCGAGGACAAGATCTGGAC 1064 



Qy 1321 AAGCTACACTGAATATGTCTTCTACCGTGAGCCTTCCAGATGGTCACCCTCAC 1373 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1065 CAGCTACACCGAGTACGTCTTCTGCAGGGAGTGAGCTCCCCAGACCCCCTCGC 1117 



RESULT 8 

US-10-029-386-20178/C 

Sequence 20178, Application US/10029386 
Publication No. US20030194704A1 
GENERAL INFORMATION: 
APPLICANT: Penn, Sharron G. 
APPLICANT: Rank, David R. 
APPLICANT: Hanzel, David K. 

TITLE OF INVENTION: HUMAN GENOME- DERIVED SINGLE EXON NUCLEIC ACID PROBES 
USEFUL FOR GENE 

TITLE OF INVENTION: EXPRESSION ANALYSIS TWO 
FILE REFERENCE: AEOMICA-X-2 

CURRENT APPLICATION NUMBER: US/10/029, 386 
CURRENT FILING DATE: 2001-12-20 
NUMBER OF SEQ ID NOS : 34288 

SOFTWARE: Annomax Sequence Listing Engine vers. 1.1 
SEQ ID NO 20178 
LENGTH: 97 8 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

MAP TO AC000403.1 

EXPRESSED IN ADULT LIVER, SIGNAL = 12 
EXPRESSED IN PLACENTA, SIGNAL = 12 
EXPRESSED IN BRAIN, SIGNAL = 12 
EXPRESSED IN HELA, SIGNAL = 15 
EXPRESSED IN HEART, SIGNAL = 13 
EXPRESSED IN LUNG, SIGNAL =7.7 
EXPRESSED IN BONE MARROW, SIGNAL =8.1 
EXPRESSED IN FETAL LIVER, SIGNAL =6.3 
SWISSPROT HIT: Q14681, EVALUE 2.00e-04 
ESTJiUMAN HIT: BG387727.1, EVALUE 8.00e-64 
NT HIT: gil6163086, EVALUE 0.00e+00 



OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
OTHER INFORMATION 
US-10-029-386-20178 



Query Match 4.5%; 
Best Local Similarity 65.6%; 
Matches 246; Conservative 



Score 156.2; DB 13; 
Pred. No. 2.2e-26; 
0; Mismatches 123; 



Length 978; 
Indels 6; 



Gaps 



l; 



Qy 

Db 



967 CCCCCCTTCCTCCCTGCTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTTACAG 1026 

I I II I I I III II III I I I I I I I I I I I I I I I I I 

376 CACGCCGTCCCAGTCGCTGGACGGCAGCCGGCGCTCGGGCTACATCACCATCGGCTACCG 317 



Qy 

Db 

Qy 

Db 



1027 AG GAT C CT GCACCTT GGGCAGAGAGGGAC AGGCAGAT GCCAAGTTTC GGAGAGT T C C C C G 1086 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I IN 

316 CGGCTCCTACACCATCGGGCGGGACGCGCAGGCGGACGCCAAGTTCCGGCGAGTGGCGCG 257 

1087 GATTTT GGTTT GT GGAAGGATTTCCTTGGCAAAAGAAGTCTTT GGAGAAACTTT GAAT GA 1146 

II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

256 CAT CACC GTT T GC GGAAAGACGT C G CT GGCCAAGGAGGT GT TT G GGGAC ACC CT GAACGA 197 



Qy 

Db 



1147 AAGCAGAGAC CCT GAT CGAGC C CC AGAAAGAT AC AC CT C CAGATT TT AT CT CAAAT T CAA 1206 
I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
196 AAGCCGGGACCCCGACCGTCCCCCGGAGCGCTACACCTCGCGCTATTACCTCAAGTTCAA 137 



Qy 1207 GCACCTGGAAAGGGCTTTTGATATGTTGTCAGAGTGTGGATTCCACATGGTGGCCTGTAA 1266 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 136 CTTCCTGGAGCAGGCCTTCGACAAGCTGTCCGAGTCGGGCTTCCACATGGTGGCGTGCAG 77 

Qy 1267 CTCATCGGTGACAGCATCTTT CAT CAAC CAAT AT AC AGAT GACAAGAT CT GGT C 1320 

III III II III II II I I II II I I I I I I I I I I I I I 

Db 76 CTC C AC GGGC AC CTGCGC CTT T GC C AGC AGCAC C GAC CAGAGCGAGGACAAGAT CT GGAC 17 

Qy 1321 AAGCT ACACT GAATA 1335 

I I I I I I I I II II 
Db 16 C AGCTAC AC C GAGT A 2 



RESULT 9 
US-10-080-980-8 

; Sequence 8, Application US/10080980 
; Publication No. US20030036115A1 
; GENERAL INFORMATION: 

; APPLICANT: Bristol-Myers Squibb Company 

; TITLE OF INVENTION: POLYNUCLEOTIDE ENCODING A NOVEL HUMAN POTASSIUM CHANNEL 
BETA-SUBUNIT, 

; TITLE OF INVENTION: K+betaM6, EXPRESSED HIGHLY IN THE SMALL INTESTINE 
; FILE REFERENCE: DO 121 NP 

; CURRENT APPLICATION NUMBER: US/10/080,980 

; CURRENT FILING DATE: 2002-02-21 

; PRIOR APPLICATION NUMBER: US 60/270,132 

; PRIOR FILING DATE: 2001-02-21 

; PRIOR APPLICATION NUMBER: US 60/278,953 

; PRIOR FILING DATE: 2001-03-27 

; NUMBER OF SEQ ID NOS : 74 

; SOFTWARE: Patentln version 3.0 

; SEQ ID NO 8 

LENGTH: 688 

TYPE: DNA 
; ORGANISM: homo sapiens 

FEATURE : 

NAME/ KEY: misc_f eature 

OTHER INFORMATION: wherein "N" is equal to "A" , "C", "G" or "T" . 
US-10-080-980-8 

Query Match 3.0%; Score 104.6; DB 15; Length 688; 

Best Local Similarity 51.1%; Pred. No. 3e-14; 

Matches 192; Conservative 0; Mismatches 178; Indels 6; Gaps 1; 

Qy 563 TCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTAT 622 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 186 TCCGCGGAGCCACCGCTCTTCCCCGACATCGTGGAGCTGAACGTGGGGGGCCAGGTGTAC 245 

Qy 623 TTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCCTCCTGTGGAAAATGTTTTCC 682 

I I I I I I I I I I III III I I I I I I I I I I I I I I I I 

Db 246 GTGACCCGGCGCTGCACGGTGGTGTCGGTGCCCGACTCGCTGCTCTGGCGCATGTTCACG 305 



Qy 

Db 



683 C CAAAGAGAGAC ACGGCTAATGAT CT AGC CAAGGACT C CAAGGGAAGGT T T T T CATT GAC 742 

I II I I I I I I I I I I II I I II I I I I I I I I I I I I 

306 CAGCAGCA GCCGCAGGAGCTGGCCCGGGACAGCAAAGGCCGCTTCTTTCTGGAC 359 



Qy 743 AGAGATGGATTCTTGTTCCGTTATATTCTGGACTATCTCAGGGACAGGCAGGTGGTCCTG 802 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 360 CGGGACGGCTTCCTCTTCCGCTACATCCTGGATTACCTGCGGGACTTGCAGCTCGTGCTG 419 

Qy 803 C CT GAT CACTT TC C AGAAAAAGGAAGACT GAAAAGGGAAGCT GAAT ACTT C C AG CT C C C A 862 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 

Db 420 CCCGACTACTTCCCCGAGCGCAGCCGGCTGCAGCGCGAGGCCGAGTACTTCGAGCTGCCA 479 

Qy 863 GACTT GGTCAAACT CCTGACCCCCGAT GAAATCAAGCAAAGCCCAGATGAATT CT GCCAC 922 

II III 

Db 480 GAGCTCGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 539 

Qy 923 AGT GACTTT GAAGATG 938 

I I I I 

Db 54 0 NNNNNNNTGCACAAGG 555 



RESULT 10 

US-09-918-995-2311 

Sequence 2311, Application US/09918995 
Publication No. US20030073623A1 
GENERAL INFORMATION: 
APPLICANT: Hyseq, Inc. 

TITLE OF INVENTION: NOVEL NUCLEIC ACID SEQUENCES OBTAINED 
TITLE OF INVENTION: FROM VARIOUS cDNA LIBRARIES 
FILE REFERENCE: 20411-756 

CURRENT APPLICATION NUMBER: US/09/918,995 
CURRENT FILING DATE: 2001-07-30 
PRIOR APPLICATION NUMBER: US/09/235,076 
PRIOR FILING DATE: 1999-01-20 
NUMBER OF SEQ ID NOS : 38054 
SOFTWARE: FastSEQ for Windows Version 3.0 
SEQ ID NO 2311 
LENGTH: 249 
TYPE: DNA 

ORGANISM: Homo sapiens 
US-09-918-995-2311 

Query Match 2.5%; Score 87; DB 11; Length 249; 

Best Local Similarity 65.2%; Pred. No. 2.4e-10; 

Matches 161; Conservative 0; Mismatches 80; Indels 6; Gaps 2; 

Qy 1236 CAGAGTGTGGATT CCACATGGTGGCCTGTAACT CATCGGTGACAGCAT CTTTCAT CAACC 1295 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 1 CCGAGGCCGGCTTCCACATGGTGGCGTGTAACTCCTCGGGCACCGCCGCCTTCGTCAACC 60 

Qy 1296 AAT AT AC AGAT GACAAGAT CT GGT CAAGCT AC ACT GAAT AT GT CTT CT AC C GT GAGC CT - 1354 

III I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I III 

Db 61 AGT AC C GC GAC GACAAGAT CT GGAGC AGCT AC ACC GAGT AC AT T TT CT T C CGAC C AC CT C 120 

Qy 1355 — T C CAGAT GGT CAC C CT CACACT GC GAT T GCT GCT GCAAGAAT GGCAAAG GTGACA 1409 

II I I I I I III II II I I I I I I I I I I I 

Db 121 AGAAAATAGTATCACCTAAACAAGAACATGAAGATAGGATACATGACCAAGTCACTGATA 180 



Qy 1410 AAGAAGGGGAGAGCGGCACGTCTTGCAATGACCTCTCCACATCTAGCTGCGACAGCCAGT 14 69 

III I I II II II II II II I I I I I I II I III II II II I I I I I I I I I 
Db 181 AAGGAAGT GAAAGT GGGACTT CCT GGAAT GAGCT CT T CACT T CC AGT T GG GACAGCC ATT 240 



Qy 1470 CTGAGGC 1476 

I I I I I I 
Db 241 CAGAGGC 247 



RESULT 11 

US-09-814-353-4862/c 

Sequence 4862, Application US/09814353 
Publication No. US20030165831A1 
GENERAL INFORMATION: 
APPLICANT: Lee, John 
APPLICANT: Thompson, Pamela 
APPLICANT: Lillie, James 

TITLE OF INVENTION: NOVEL GENES, COMPOSITIONS, KITS, AND METHODS FOR 
TITLE OF INVENTION: IDENTIFICATION, ASSESSMENT, PREVENTION, AND 
TITLE OF INVENTION: THERAPY OF OVARIAN CANCER 
FILE REFERENCE: MRI-006B 

CURRENT APPLICATION NUMBER: US/09/814,353 
CURRENT FILING DATE: 2001-03-21 
PRIOR APPLICATION NUMBER: US 60/191,031 
PRIOR FILING DATE: 2000-03-21 
PRIOR APPLICATION NUMBER: US 60/207,124 
PRIOR FILING DATE: 2000-05-25 
PRIOR APPLICATION NUMBER: US 60/211,940 
PRIOR FILING DATE: 2000-06-15 
PRIOR APPLICATION NUMBER: US 60/216,820 
PRIOR FILING DATE: 2000-07-07 
PRIOR APPLICATION NUMBER: US 60/220,661 
PRIOR FILING DATE: 2000-07-25 
PRIOR APPLICATION NUMBER: US 60/257,672 
PRIOR FILING DATE: 2000-12-21 
NUMBER OF SEQ ID NOS : 22037 
SOFTWARE: FastSEQ for Windows Version 4.0 
SEQ ID NO 4 862 
LENGTH: 4 96 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE: 

NAME/KEY: misc_feature 

LOCATION: 156, 157, 160, 161, 162, 163, 164, 165, 167, 168, 169, 170, 
LOCATION: 171, 172, 173, 174, 175, 196, 197, 198, 200, 203, 205, 206, 
LOCATION: 219, 220, 228, 232, 240, 241, 244, 245, 246, 247, 249, 250, 
LOCATION: 252, 253, 256, 258, 260, 263, 264, 265, 267, 268, 269 
OTHER INFORMATION: n = A,T,C or G 
FEATURE: 

NAME/KEY: mis cofeature 

LOCATION: 270, 271, 274, 275, 280, 287, 288, 289, 290, 303, 306, 317, 
LOCATION: 318, 322, 331, 347, 348, 355, 361, 362, 364, 367, 368, 369, 
LOCATION: 381, 383, 388, 393, 398, 404, 408, 409, 410, 411, 412, 413, 
LOCATION: 414, 415, 416, 417, 418, 419, 420, 421, 423, 424, 435 
OTHER INFORMATION: n = A,T,C or G 
FEATURE : 



; NAME/ KEY : misc_f eature 

LOCATION: 436, 438, 441, 450, 451, 452, 453, 454, 455, 456, 457, 458, 

LOCATION: 470, 471, 472, 475, 477, 478, 481, 482, 484 

OTHER INFORMATION: n = A,T,C or G 
US-09-814-353-4862 

Query Match 2.3%; Score 80.6; DB 13; Length 496; 

Best Local Similarity 47.3%; Pred. No. 1.2e-08; 

Matches 131; Conservative 0; Mismatches 146; Indels 0; Gaps 0; 

Qy 3192 T CT GT AT T TT ACT AAGGT ACCAAT AGCT CT TT C AT AGACT T GT GCT ACAAGAAGGTTAAA 3251 

I I I I I I I I III I I I I I I I I II I II 

Db 343 TTTTTTTTTTCCNCCCCTTTCNTTTNNAATTAAAAAANATNTTTTTTCCCCAANNNNAAA 284 

Qy 3252 AGACCAGTTTTATTTTCAGCATTCCTCATGCATTTCAGTGGTAACCAAAAAATAATTTGT 3311 

III. I II III I I I I I I I I I I 

Db 283 AAANAAAANNAANNNNNTNNNTTNANTNTTNNTNNCNNNNGGNNAAAAAAANTTTNTTTT 224 

Qy 3312 CAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGT 3371 

III II I II I I I I I I I II 

Db 223 TTTNNAAAAAAAAAAAGNNANATNTNNNTTTTTTTTTTTTTTTTTTTTNNNNNNNNNCNN 164 

Qy 3372 GT ATGT GT AT CAC AGGTAATAAAGGCAAT T GGATGAT T AAAAAAAAAAAAAAAAAAAAAA 3431 

I II I I I I I II I I I I I I I I I I II I I I I I I I I I I I I 

Db 163 NNNNGCNNCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 104 

Qy 3432 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I I I I I I I I I I II I II I I I I I I I I II I I I I I I I I 
Db 103 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 67 



RESULT 12 

US-09-814-353-11159/C 

; Sequence 11159, Application US/09814353 

; Publication No. US20030165831A1 

; GENERAL INFORMATION: 

; APPLICANT: Lee, John 

APPLICANT: Thompson, Pamela 

; APPLICANT: Lillie, James 

; TITLE OF INVENTION: NOVEL GENES, COMPOSITIONS, KITS, AND METHODS FOR 

; TITLE OF INVENTION: IDENTIFICATION, ASSESSMENT, PREVENTION, AND 

; TITLE OF INVENTION: THERAPY OF OVARIAN CANCER 

; FILE REFERENCE: MRI-006B 

; CURRENT APPLICATION NUMBER: US/09/814 , 353 

; CURRENT FILING DATE: 2001-03-21 

; PRIOR APPLICATION NUMBER: US 60/191,031 

; PRIOR FILING DATE: 2000-03-21 

; PRIOR APPLICATION NUMBER: US 60/207,124 

; PRIOR FILING DATE: 2000-05-25 

; PRIOR APPLICATION NUMBER: US 60/211,940 

; PRIOR FILING DATE: 2000-06-15 

; PRIOR APPLICATION NUMBER: US 60/216,820 

; PRIOR FILING DATE: 2000-07-07 

; PRIOR APPLICATION NUMBER: US 60/220,661 

; PRIOR FILING DATE: 2000-07-25 

; PRIOR APPLICATION NUMBER: US 60/257,672 

; PRIOR FILING DATE: 2000-12-21 



NUMBER OF SEQ ID NOS : 22037 
SOFTWARE: FastSEQ for Windows Version 4.0 
SEQ ID NO 11159 
LENGTH: 4 96 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE : 

NAME /KEY : misc_f eature 

LOCATION: 156, 157, 160, 161, 162, 163, 164, 165, 167, 168, 169, 170, 
LOCATION: 171, 172, 173, 174, 175, 196, 197, 198, 200, 203, 205, 206, 
LOCATION: 219, 220, 228, 232, 240, 241, 244, 245, 246, 247, 249, 250, 
LOCATION: 252, 253, 256, 258, 260, 263, 264, 265, 267, 268, 269 
OTHER INFORMATION: n = A,T,C or G 
FEATURE : 

NAME/KEY: misc_f eature 

LOCATION: 270, 271, 274, 275, 280, 287, 288, 289, 290, 303, 306, 317, 
LOCATION: 318, 322, 331, 347, 348, 355, 361, 362, 364, 367, 368, 369, 
LOCATION: 381, 383, 388, 393, 398, 404, 408, 409, 410, 411, 412, 413, 
LOCATION: 414, 415, 416, 417, 418, 419, 420, 421, 423, 424, 435 
OTHER INFORMATION: n = A,T,C or G 
FEATURE: 

NAME/ KEY : misc_f eature 

LOCATION: 436, 438, 441, 450, 451, 452, 453, 454, 455, 456, 457, 458, 
LOCATION: 470, 471, 472, 475, 477, 478, 481, 482, 484 
OTHER INFORMATION: n = A,T,C or G 
US-09-814-353-11159 

Query Match 2.3%; Score 80.6; DB 13; Length 496; 

Best Local Similarity 47.3%; Pred. No. 1.2e-08; 

Matches 131; Conservative 0; Mismatches 146; Indels 0; Gaps 0; 

Qy 3192 T CT GT AT T TT ACT AAGGT AC CAAT AGCT CTT T CAT AGACTT GT GCT ACAAGAAGGTTAAA 3251 

I I I I I I I I III I I I I I I I I II Ml 

Db 343 TTTTTTTTTTCCNCCCCTTTCNTTTNNAATTAAAAAANATNTTTTTTCCCCAANNNNAAA 284 

Qy 3252 AGACCAGT TT TAT T T T CAGCAT T CCT C AT GC AT TT CAGT GGTAAC CAAAAAAT AAT T T GT 3311 

III I II I II I Mill I III 

Db 283 AAANAAAANNAANNNNNTNNNTTNANTNTTNNTNNCNNNNGGNNAAAAAAANTTTNTTTT 224 

Qy 3312 CAATTAATAGTTGTGTGCCAAGCACTCCTAATTTGTTTTATTGCGTGTGTGTGCATGTGT 3371 

III II I I I I I I I I I I II 

Db 223 TTTNNAAAAAAAAAAAGNNANATNTNNNTTTTTTTTTTTTTTTTTTTTNNNNNNNNNCNN 164 

Qy 3372 GT AT GT GT AT CACAGGTAATAAAGGCAAT T GGAT GATTAAAAAAAAAAAAAAAAAAAAAA 3431 

I II I I I II II I I II II II I I II II II II I II II I 

Db 163 NNNNGCNNCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 104 

Qy 3432 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I II II M II I I M I I II I I I II I I II II I I II II I I 

Db 103 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 67 



RESULT 13 
US-10-056-884-8/C 

; Sequence 8, Application US/10056884 
; Publication No. US20030032786A1 
; GENERAL INFORMATION: 



; APPLICANT: Bristol-Myers Squibb Company 

; TITLE OF INVENTION: POLYNUCLEOTIDE ENCODING A NOVEL HUMAN POTASSIUM CHANNEL 
BETA-SUBUNIT, 

TITLE OF INVENTION: K+betaM2 
FILE REFERENCE: D0076 NP 

CURRENT APPLICATION NUMBER: US/ 10/056, 884 
CURRENT FILING DATE: 2002-01-24 
PRIOR APPLICATION NUMBER: US 60/263,872 
PRIOR FILING DATE: 2001-01-24 
PRIOR APPLICATION NUMBER: US 60/269,794 
PRIOR FILING DATE: 2001-02-14 
NUMBER OF SEQ ID NOS : 73 
SOFTWARE: Patentln version 3.0 
SEQ ID NO 8 
LENGTH: 80 
TYPE: DNA 

ORGANISM: Artificial Sequence 
FEATURE : 

OTHER INFORMATION: Synthetic Oligonucleotide Modified To Contain Biotin at 
the 5 Pr 

OTHER INFORMATION: ime En 
US-10-056-884-8 

Query Match 2.3%; Score 80; DB 15; Length 80; 

Best Local Similarity 100.0%; Pred. No. 5.4e-09; 

Matches 80; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 783 GGGAC AGGC AGGT GGT C CT GC CTGAT C ACT TT C C AGAAAAAGGAAGACT GAAAAGGGAAG 842 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 80 GGGAC AGGC AGGT GGT CCT GCCTGAT C ACT TT C C AGAAAAAGGAAGACT GAAAAGGGAAG 21 

Qy 843 CTGAATACTTCCAGCTCCCA 8 62 

I I I I I I I I I I I I I I I I I I I I 
Db 20 CTGAATACTTCCAGCTCCCA 1 



RESULT 14 

US-09-834-975-451/C 

; Sequence 451, Application US/09834975 

; Patent No. US20020110815A1 

; GENERAL INFORMATION: 

; APPLICANT: Lillie, James 

; APPLICANT: Brown, Jeffrey 

; APPLICANT: Bolt, Andrew 

; APPLICANT: Van Huff el, Chris tophe 

; TITLE OF INVENTION: NOVEL GENES, COMPOSITIONS AND METHODS 

; TITLE OF INVENTION: FOR THE IDENTIFICATION, ASSESSMENT, PREVENTION, AND 
THERAPY 

; TITLE OF INVENTION: OF HUMAN CANCERS 

; FILE REFERENCE: MRI-016B 

; CURRENT APPLICATION NUMBER: US/09/834,975 

; CURRENT FILING DATE: 2001-04-13 

; PRIOR APPLICATION NUMBER: 60/197,538 

; PRIOR FILING DATE: 2000-04-14 

; NUMBER OF SEQ ID NOS: 1046 

; SOFTWARE: FastSEQ for Windows Version 4.0 

; SEQ ID NO 451 



LENGTH: 425 
; TYPE: DNA 
; ORGANISM: Homo sapiens 

FEATURE : 
; NAME /KEY: mis cofeature 

LOCATION: (1) . . . (425) 
; OTHER INFORMATION: n = A,T,C or G 
US-09-834-975-451 

Query Match 2.3%; Score 79; DB 10; Length 425; 

Best Local Similarity 50.8%; Pred. No. 2.6e-08; 

Matches 184; Conservative 0; Mismatches 178; Indels 0; Gaps 0; 

Qy '3107 AAAT GAAACT AT CTTTT T CAAT T AC AT CCT GACTT GT AT AGAC ACAGC CAAAAAGAAACT 3166 

II I III I I I I I II II I I III I I I I I I I I I I I I I 
Db 381 AAAAAAAAAAATATTTTTTTTTTTTTTTTTTTTTTTTCCAAAAAAAAAAAAAAAAAAACC 322 

Qy 3167 GTTAATAGCCATCCGTCCATGTAACTCTGTATTTTACTAAGGTACCAATAGCTCTTTCAT 3226 

III I III I I I I I I III II 

Db 321 CTTTTTTTTTTTTAAAAAAAGTTTTTTTTTTTTAAAAACCCCCCCCTTTTTTTTTGGGGG 262 

Qy 3227 AGACTTGT GCT ACAAGAAGGTT AAAAGAC CAGTTT T ATTT T C AGC AT T CCT CAT GCAT TT 3286 

I II I I I I I I I I I I I I I I II I I 

Db 261 GGGGGGATTTTTTTTTTTTTTGGAAAAACCCCCTTTTTTTTTTTTTTTTTAAAAAAAACG 202 

Qy 3287 CAGTGGTAACCAAAAAATAATTTGTCAATTAATAGTTGTGTGCCAAGCACTCCTAATTTG 3346 

III I I I I I I I I I I II I I I I I I II I I Ml 

Db 201 GGGGGGGGAAAAAAAAAAAAACCCTTTTTTTTTTTTTTTGGGGGAAATTTTTTTTTTTTT 142 

Qy 3347 TTTT ATT GCGT GT GT GTGCAT GTGT GTATGTGTAT CACAGGTAATAAAGGCAATT GGATG 3406 

I I I I I I I I I I I I I III I I I II II I 

Db 141 TTTTTTTTTTTAAAATTTTTTTTTTTTTTNNNAAAAAAAAAAAAAAAAAAAAAAAAAAAA 82 

Qy 3407 ATTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3466 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 81 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 22 

Qy 3467 AA 3468 

I I 

Db 21 AA 20 



RESULT 15 
US-09-925-299-112 

; Sequence 112, Application US/09925299 

; Patent No. US20020055627A1 

; GENERAL INFORMATION: 

; APPLICANT: Rosen et al . 

; TITLE OF INVENTION: Nucleic Acids, Proteins and Antibodies 

; FILE REFERENCE: PA102 

; CURRENT APPLICATION NUMBER: US/09/925,299 

; CURRENT FILING DATE: 2001-08-10 

; PRIOR APPLICATION NUMBER: PCT/US00/058 83 

; PRIOR FILING DATE: 2000-03-08 

; PRIOR APPLICATION NUMBER: 60/124,270 

; PRIOR FILING DATE: 1999-03-12 

; NUMBER OF SEQ ID NOS : 1556 



SOFTWARE: Patentln Ver. 2.0 
SEQ ID NO 112 
LENGTH: 14 92 
TYPE: DNA 

ORGANISM: Homo sapiens 
FEATURE: 

NAME /KEY: misc_f eature 
LOCATION: (8) 

OTHER INFORMATION: n equals a,t,g, or c 
NAME/KEY: misc_feature 
LOCATION: (1487) 

OTHER INFORMATION: n equals a,t,g, or c 
NAME/ KEY: mi sc_f eature 
LOCATION: (1491) 

OTHER INFORMATION: n equals a,t,g, or c 
US-09-925-299-112 



Query Match 2.3%; Score 79; DB 9; Length 1492; 

Best Local Similarity 70.2%; Pred. No. 5.6e-08; 

Matches 106; Conservative 0; Mismatches 45; Inciels 0; Gaps 0; 

Qy 3318 AT AGT T GT GTGC CAAGC ACT CCT AAT TT GT TTT AT TGC GT GT GT GT GC AT GT GT GT ATGT 3377 

I I I I I I I I I I I I I I I I I I I I I I II III 

Db 1276 AGAAAT AT ATT GGAGGCAAAGTT C AGTT GAT GACAATT GT GT AT AT GTT ACT GAT GCTGT 1335 



Qy 3378 GTAT CACAGGTAATAAAGGCAATT GGAT GATTAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3437 

II I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1336 AAATTATTTTTAATAAAGAAAATTGTATTATCA^ 1395 

Qy 3438 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 3468 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1396 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1426 



Search completed: January 29, 2004, 02:51:24 
Job time : 1088 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM nucleic - nucleic search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence: 



January 28, 2004, 20:27:00 ; Search time 6858 Seconds 

(without alignments) 
12290.452 Million cell updates/sec 

US-10-056-884A-1 
3468 

1 caagcactgtgctaaagtgt aaaaaaaaaaaaaaaaaaaa 3468 



Scoring table: IDENTITYJtfUC 

Gapop 10.0 , Gapext 1.0 

Searched: 22781392 seqs, 12152238056 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



45562784 



Database 



EST: 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 



em_estba : * 
em_esthum: * 
em_estin: * 
em_e s tmu : * 
em__estov: * 
em_estpl : * 
em_estro : * 
em_htc : * 
gb_estl : * 
gb_est2 : * 
gb_htc: * 
gb__est3 : * 
gb_est4 : * 
gb_est5 : * 
em_estfun: * 
em_estom: * 
em_gs s_hum : * 
em_gss_inv: * 
em_gss_pln: * 
em_gss_vrt : * 
em_gss_f un : * 
em_gss_mam: * 
em_gss_mus : * 
em_gss_pro : * 
em_gss_rod : * 
em_gss_phg : * 
em_gss_vrl : * 



28: 
29: 



gb_gssl : * 
gb_gss2 : * 



Preci. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 
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ALIGNMENTS 



RESULT 1 

AQ536411/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
COMMENT 



FEATURES 

source 



GI: 4848101 



(human) 



Chordata; 
Primates ; 



Craniata ; Vertebrata; Euteleostomi ; 
Catarrhini; Homihidae; Homo. 



Nierman,W., Malek,J., de Jong, P. and Venter 



BASE COUNT 
ORIGIN 



AQ536411 489 bp DNA linear GSS 18-MAY-1999 

RPCI-11-318B21.TJ RPCI-11 Homo sapiens genomic clone RPCI-11-318B21 
, genomic survey sequence. 
AQ536411 
AQ536411.1 
GSS. 

Homo sapiens 
Homo sapiens 
Eukaryota; Metazoa; 
Mammalia; Eutheria; 
1 (bases 1 to 489) 
Zhao,S., Adams , M.D., 
, J.C. 

Use of BAC End Sequences from Library RPCI-11 for Sequence-Ready 

Map Building 

Unpublished 

Other_GSSs: RPCI-11-318B21 . TV 

Contact: Shaying Zhao, William Nierman, Mark Adams 

Department of Eukaryotic Genomics 

The Institute for Genomic Research 

9712 Medical Center Dr., Rockville, MD 20850 

Tel: 301 838 0200 

Fax: 301 838 0208 

Email : hbe@tigr . org 

Clones are derived from the human BAC library RPCI-11. For BAC 
library availability, please contact Pieter de Jong 
(pieter@dejong.med.buffalo.edu) . Clones may be purchased from 
BACPAC Resources (http://bacpac.med.buffalo.edu/ordering) or from 
Research Genet cs (info@resgen.com). BAC end search page: 
http : //www. tigr . org/ tdb/humgen/bac_end_search/bac_end_search . html . 
Seq primer: SP6 
Class: BAC ends. 

Location/Qualifiers 

1. .489 

/organism="Homo sapiens" 
/mol_type=" genomic DNA" 
/ db_xr e f = " GDB :7621772" 
/db_xref="taxon: 9606" 
/ cl one= " RPCI -11-318 B21" 
/sex="Male" 

/ cell_type="Lymphocytes" 
/clone_lib="RPCI-ll" 

/note="Vector : pBACe3.6; Site_l: EcoRI; Site_2 : EcoRI; 
RPCI 11 Human Male BAC Library" 
131 a 114 c 105 g 137 t 2 others 



Query Match 13.8%; Score 479; DB 28; Length 489; 

Best Local Similarity 98.6%; Pred. No. 3.3e-42; 

Matches 482; Conservative 0; Mismatches 7; Indels 0; Gaps 



0; 



Qy 

Db 



383 GGATAAGAGGAGGTCATTTTTTAATAAGTTAGCATCCTTTTCCCTTTCTTACAAGTTGAT 442 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I I 
4 89 GGATAAGAGNAGGT CATTTTTTAATAAGTTAGCAT CCTTTT CCCTATCTTACAAGTT GAT 430 



Qy 443 CCAAAGGATAAGGCT GTGACTCCATTGGATTGCACCTTTAAAT CAAAATAGCAGCAGCAG 502 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 429 CCAN AGGAT AAGGCT GT GACT C CAT TGGAT T GC AC CTT TAAAT CAAAATAGCAGCAGCAG 370 

Qy 503 AAGAAAGGGACAAT GGCT CT GAGT GGAAACT GTAGT C GT T ATT AT CCT C GAGAACAAGGG 562 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 369 AAGACAGGGACAAT GGCT CT GAGT GGAAACT GTAGT C GT T ATT ATC CT C GAGAACAAGGG 310 

Qy 563 TCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTAT 622 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 309 TCCGCAGTTCCCAACTCCTTCCGTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTTTAT 250 

Qy 623 TTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCCTCCTGTGGAAAATGTTTTCC 682 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 249 TTTACTCGCCATACCACATTGATAAGCATCCCTCATTCCCTCCTGTGGAAAATGTTTTCC 190 

Qy 683 C CAAAGAGAGACAC GGCTAAT GAT CT AGC CAAGGACT C C AAGGGAAGGT TTTT C ATT GAC 742 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 189 C CAAAGAGAGACAC GGCT AAT GAT CT AGC CAAGGACT CCAAGGGAAGGT TTTT C ATT GAC 130 

Qy 743 AGAGATGGATTCTTGTTCCGTTATATTCTGGACTATCTCAGGGACAGGCAGGTGGTCCTG 802 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 129 AGAGAT GGATT CTT GTT C CGT TAT ATT CT GGACT AT CT C AGGGACAGGCAGGT GGTC CT G 70 

Qy 803 C CT GAT CACTT T C CAGAAAAAGGAAGACT GAAAAGGGAAGCT GAAT ACT T C CAGCTC C C A 862 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 69 C CT GATC ACTT T C CAGAAAAAGGAAGACT GAAAAGGGAAGCT GAAT ACT T C CAG CT C CAA 10 

Qy 8 63 GACT T GGTC 871 

I I I I I I I I I 
Db 9 GACT T GGTC 1 



RESULT 2 
AK015313 
LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 



AK015313 810 bp mRNA linear HTC 05-DEC-2002 

Mus musculus adult male testis cDNA, RIKEN full-length enriched 
library, clone : 4930434H12 product : inferred: RIKEN cDNA 4930434H12 
gene / putative [Mus musculus] , full insert sequence. 
AK015313 

AK015313. 1 GI : 12853602 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
1 

Carninci,P. and Hayashizaki, Y. 
High-efficiency full-length cDNA cloning 
Meth. Enzymol. 303, 19-44 (1999) 
99279253 
10349636 



Craniata ; Vertebrata ; Euteleos tomi ; 
Sciurognathi; Muridae; Murinae; Mus. 



REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
REFERENCE 
AUTHORS 



Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata, K., 

Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new genes 

Genome Res. 10 (10) , 1617-1630 (2000) 

20499374 

11042159 

3 

Shibata, K. , Itoh,M. , 
Konno, H . , Akiyama, J. 



Aizawa, K . 
Nishi;K. 



Nagaoka, S. , Sasaki, N. , Carninci, P. , 
Kitsunai,T., Tashiro,H., Itoh,M. , 
Sumi,N., Ishii,Y., Nakamura,S., Hazama,M., Nishine,T., Harada,A. , 
Yamamoto^R., Matsumoto, H . , Sakaguchi , S . , Ikegami,T., Kashiwagi, K. , 
Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., Watahiki,M., 
Yoneda,Y., Ishikawa,T., Ozawa^K., Tanaka,T., Matsuura,S., Kawai,J., 
Okazaki,Y., Muramatsu,M. , Inoue,Y., Kira,A. and Hayashizaki, Y. 
RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
20530913 
11076861 
4 

Kawai,J., Shinagawa, A. , Shibata, K., Yoshino,M., Itoh,M. , Ishii,Y., 
Arakawa,T., Hara,A. , Fukunishi, Y . , Konno, H., Adachi,J., Fukuda,S., 
Aizawa, K. , Izawa,M., Nishi,K., Kiyosawa,H., Kondo,S., Yamanaka,I., 
Saito,T. , Okazaki,Y., Gojobori,T., Bono,H., Kasukawa,T., Saito,R., 
Kadota,K., Matsuda,H., Ashburner ,M. , Batalov, S . , Casavant,T., 
Fleischmann, W. , Gaasterland, T . , Gissi,C, King,B., Kochiwa,H., 
Kuehl,P., Lewis, S., Matsuo,Y., Nikaido,I., Pesole,G., 
Quackenbush, J. , Schriml, L.M. , Staubli,F., Suzuki, R. , Tomita,M., 
Wagner, L . , Washio,T., Sakai,K., Okido,T., Furuno,M., Aono,H., 
Baldarelli, R. , Barsh,G., Blake, J., Boffelli,D., Bojunga,N., 
Carninci, P., de Bonaldo, M. F. , Brownstein,M. J. , Bult,C, 
Fletcher, C, Fujita,M., Gariboldi,M. , Gustincich, S . , Hill,D., 
Hofmann,M., Hume, D. A., Kamiya,M., Lee,N.H., Lyons, P., 
Marchionni, L. , Mashima,J., Mazzarelli, J. , Mombaerts, P . , Nordone,P., 
Ring,B., Ringwald,M., Rodriguez, I . , Sakamoto, N. , Sasaki, H., 
Sato,K., Schonbach, C. , Seya,T., Shibata, Y., Storch,K.F., Suzuki, H., 
Toyo-oka,K., Wang,K.H., Weitz,C, Whittaker, C . , Wilming,L., 
Wynshaw- Boris, A. , Yoshida,K., Hasegawa,Y., Kawaji,H., Kohtsuki,S. 
and Hayashizaki, Y. 

Functional annotation of a full-length mouse cDNA collection 

Nature 409 (6821), 685-690 (2001) 

21085660 

11217851 

5 

The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 
6 (bases 1 to 810) 

Adachi,J., Aizawa, K. , Akahira,S., Akimura,T., Arai,A., Aono,H., 
Arakawa,T., Bono,H., Carninci, P., Fukuda,S., Fukunishi, Y. , 
Furuno,M., Hanagaki,T., Hara,A. , Hayatsu,N., Hiramoto,K., 
Hiraoka,T., Hori,F., Imotani,K., Ishii,Y., Itoh,M., Izawa,M., 
Kasukawa,T., Kato,H., Kawai,J., Kojima,Y., Konno, H., Kouda,M., 



TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



misc feature 



BASE COUNT 
ORIGIN 



Koya,S., Kurihara,C, Matsuyama, T Miyazaki,A., Nishi,K., 
Nomura,K. f Numazaki, R. , Ohno,M., Okazaki,Y., Okido,T., Owa,C, 
Saito,H., Saito,R., Sakai,C, Sakai,K., Sano,H., Sasaki, D., 
Shibata,K., Shibata,Y., Shinagawa,A. , Shiraki,T., Sogabe,Y., 
Suzuki, H., Tagami,M., Tagawa,A., Takahashi , F. , Tanaka,T., 
Tejima,Y., Toya,T., Yamamura,T., Yasunishi, A. , Yoshida,K., 
Yoshino,M., Muramatsu, M. and Hayashizaki, Y. 
Direct Submission 

Submitted ( 10- JUL-2000 ) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN) , Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome-res @gsc . riken .go . jp, 
URL :http:/ /genome. gsc. ri ken. go. jp/, Tel : 81-45-503-9222, 
Fax:81-45-503-9216) 

Please visit our web site (http://genome.gsc.riken.go.jp/) for 
further details. 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. First strand cDNA was primed with a primer 
[5' GAGAGAGAGAAGGAT C CAAGAGCT CT TTT TTT TT T T TTT TT VN 3'], cDNA was 
prepared by using trehalose thermo-activated reverse transcriptase 
and subsequently enriched for full-length by cap-trapper. Second 
strand cDNA was prepared with the primer adapter of sequence [5' 
GAGAGAGAGAT T CT C GAGTT AATTAAAT TAATCC CCCCCCCC CC C 3 B ]- cDNA was cleaved 
with BamHI and Xhol . cDNA of size comprised longer than 7 kb was 
selected before cloning. Vector: a modified pBluescript KS( + ) after 
bulk excision from Lambda FLC I. Cloning sites, 5' end: Sail; 3 f 
end: BamHI. Host: DH10B. 

Location/Qualifiers 

1. .810 

/organism="Mus musculus" 

/mol_type="mRNA M 

/strain="C57BL/6J" 

/db_xref="FANTOM_DB:4930434H12" 

/db_xref= n MGI : 1896697" 

/db_xref="taxon: 10090" 

/clone="4930434H12" 

/ sex="male" 

/ t is sue_type=" testis" 

/clone_lib="RIKEN full-length enriched mouse cDNA library" 
/dev_j3tage="adult" 
1. .810 

/note="inferred: RIKEN cDNA 4930434H12 gene / putative 
[Mus musculus] (UniGene | Mm. 46143, TIGR-MGI1 | TC1870, 
evidence: UG/TGI) " 
/db_xref="MGI : 1914659" 
226 a 192 c 208 g 184 t 



Query Match 12.8%; Score 442.4; DB 11; Length 810; 

Best Local Similarity 83.9%; Pred. No. 2e-38; 

Matches 554; Conservative 0; Mismatches 86; Indels 20; Gaps 4; 



Qy 1347 GTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAATGGCAAAGGTG 1406 



Db 167 GTGAGCCTTCCCGGTGGTCCTCCTCTCATTGTGACTGCTGCTGCAAGAATGGCAAGGGAG 226 

Qy 1407 ACAAAGAAGGGGAGAGC GGCACGT CTTGCAATGACCTCT CCACATCTAGCT GCGACAGCC 1466 

III I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I II I I I I I I I I I I I I 
Db 227 ACA AAGGAGAGAGC GGC ACCT C CT GCAAT GACCTGT C C ACTT C C AGCT GT GACAG CC 283 

Qy 14 67 AGT CT GAGGC C AGCT CT C C C C AGGAGACGGT CAT CT GT GGT CC C GT GACAC GC C AGAC C A 1526 

I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II II II I I I I I I I II 
Db 284 AGTCAGAGGCCAGCTCTCCGCAGGAGACGGTGATCTGTGGGCCTGTAACGCGCCAGAGCA 343 

Qy 1527 AC ATC C AGACT CT GGACCGT C CC AT CAAGAAGGGCC CT GT C CAGCT GAT C CAAC AGT CAG 1586 

I I I I I I I I I I I I I I I I II II I II I I I I I I II II II I I I I I I I I I I I I I I I I I I I 
Db 34 4 AC ATC CAGACT CT GGAT CGG C C CAT CAAGAAAG GTC C GGT GC AGCT GAT C CAAC AGT CAG 403 

Qy 1587 AGATGCGGC GGAAAAGC GACT T ACT C CGGATT CT GACTT C AGGCT C CAGGGAAT C GAAC A 1646 

I I I II I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 404 AGAT GAGGC GGAAAAGT GAC CT GCT C C GGACT CT GACGT C AGGCT C C AGGGAGTC GAAC A 463 

Qy 164 7 T GAGCAGCAAAAAAAAAGCT GT T AAAGAAAAGCT CT CAATT GAGGAGGAGCT GGAGAAAT 1706 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 464 TAAGCAGCAAAAAGAAAGCT GC GAAGGAAAAGCT CT C CAT C GAGGAAGAGCT GGAGAAAT 523 

Qy 1707 GT AT CCAGGATT T CCT AAAAAAAAAAAT T CC AGAT C GGTTT CCT GAGAGAAAACAT C CTT 1766 

I I I I I I I I I I I I I I I II I I I II I I I II I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 524 GTATCCAGGATTTCTTGAAGATAAAAATTCCAGATCGCTTCCCTGAGCGAAAACATCCTT 583 

Qy 1767 GGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGGCGGGGAAAAAA 1826 

I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I III II I II 
Db 584 GGCAGTCTGAACTTTTACGGAAGTATCATCTATAGGGGGAGGGCTGTGG 632 

Qy 1827 AAAAAAAAGAGTCATTTT GAAATTAACCT CATAAAAGGAATTCATATTTTAAAGGAAAAA 1886 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 633 GTAGT CGCCACTTTGAAATAAAC CT CCCCAAAGGAAGACATATGTTAAAGGAAAAA 688 

Qy 1887 AAT ACAACTAAT GAT GCACATTTCTTAGAACACAATAGT CCATTGATATACTACTGCCTA 194 6 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I II 
Db 68 9 T A- ACAACTAAC GGT CC AC AT TTGTT AGAT C ACAAT - GT C CAT T GAT GTACT ACT GCCT A 746 

Qy 1947 CT TT AC CT AGTT CACCT TAACATGT AAAT C C ACAGGGT AGATT T CT T T CT AGAT GT GGAA 2006 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 747 CTTTGCCTAGCTCACCTTAACGTGTAAATCCACAGGGTAGATTTCTTTCTAGATGTGGAA 8 06 



RESULT 3 

AQ525390/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 



AQ525390 592 bp DNA linear GSS ll-MAY-1999 

HS_5228_B2_C05_T7A RPCI-11 Human Male BAC Library Homo sapiens 
genomic clone Plate=804 Col=10 Row=F, genomic survey sequence. 
AQ525390 

AQ525390.1 GI:4772710 
GSS. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
1 (bases 1 to 592) 



AUTHORS Mahairas,G.G. , Wallace, J. C . , Smith, K. , Swartzell, S . , Holzman, T. , 

Keller, A., Shaker, R. , Furlong, J., Young, J. , Zhao,S., Adams, M.D. and 
Hood, L. 

TITLE Sequence-tagged connectors : A sequence approach to mapping and 

scanning the human genome 
JOURNAL Proc. Natl. Acad. Sci. U.S.A. 96 (17), 9739-9744 (1999) 
MEDLINE 99380589 
PUBMED 10449764 
COMMENT Contact: Mahairas GG, Wallace JC, Hood L 

High Throughput Sequencing Center 
University of Washington 

401 Queen Anne Avenue North, Seattle, WA 98109, USA 

Tel: (206) 616-3618 

Fax: (206) 616-3887 

Email: jwallace@u. Washington. edu 

Clones are derived from the human BAC library RPCI-11. For BAC 
library availability, please contact Pieter de Jong 
(pieter@dejong.med.buffalo.edu) . Clones may be purchased from 
BACPAC Resources (http : / /bacpac .med. buf f alo . edu/ordering_bac . htm) 
or from Resear h Genetics (info@resgen.com) . BAC end Web Server: 
http: //www. ht sc. Washington. edu 
Plate: 804 row: F column: 10 
Seq primer: T7 
Class: BAC ends 

High quality sequence stop: 592. 
FEATURES Location/Qualifiers 
source 1. .592 

/organism="Homo sapiens" 
/ mo l_type= "genomic DNA" 
/db_xref ="taxon : 9606" 
/clone="Plate=804 Col=10 Row=F" 
/sex="male" 

/clone_lib="RPCI-ll Human Male BAC Library" 
/note="Vector : pBACe3.6; Site_l: EcoRI; Site_2 : EcoRI; 
Male blood DNA was isolated from one randomly chosen donor 
and partially digested with a combination of EcoRI and 
EcoRI Methylase. Size selected DNA was cloned into the 
pBACe3.6 vector at EcoRI sites" 

BASE COUNT 157 a 139 c 133 g 158 t 5 others 

ORIGIN 

Query Match 11.5%; Score 400.2; DB 28; Length 592; 

Best Local Similarity 96.2%; Pred. No. 7.6e-34; 

Matches 408; Conservative 0; Mismatches 16; Indels 0; Gaps 0; 



Qy 928 CTTTGAAGATGCCTCCCAAGGAAGCGACACAAGAATCTGCCCCCCTTCCTCCCTGCTCCC 987 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

Db 545 CTT T GAAGATGC CT CC CAAGGAAGAGAC ACAAGAATGT GC CN C CTTT C CT CCGT GCT CC A 486 

Qy 988 TGCCGACCGCAAGTGGGGTTTCATTACTGTGGGTTACAGAGGATCCTGCACCTTGGGCAG 1047 

III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 485 TGCGGACCGCAAGTGAGGTTTCATTACTGTGGGTTACAGAGGATCCTGCACTTTGGGCAG 426 

Qy 1048 AGAGGGACAGGCAGAT GCCAAGTTT CGGAGAGTT CCCCGGATTTT GGTTTGTGGAAGGAT 1107 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 425 AGAGGGACAGGCAGAT GCCAAGTT T CGN AGAGTT C CC CGGATTNT G GT TT GTGGAAGGAT 366 



Qy 1108 TT CCTT GGCAAAAGAAGT CTT T GGAGAAACTT T GAAT GAAAGC AGAGAC CCT GAT CGAGC 1167 

I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I II I I I I I I I I I 
Db 365 TT CCTT GGCGAAAGAAGTCTTT GGAGAAACTTTGAATGAAAGCAGAGACCCT GATCGAGC 306 

Qy 1168 CCC AGAAAGAT AC ACCT C C AGATTTT AT CT CAAAT T CAAGCAC CTGGAAAGGGCT TT T GA 1227 

II I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I II I I I I I I I I I 
Db 305 GCC AGAT AGAT AC AC CT CCAGATTTT AT CT CAAAT T CAAGCAC CTGGAAAGGGCT TT T GA 24 6 

Qy 1228 TAT GTT GT CAGAGTGT GGATTCCACAT GGTGGCCT GTAACTCAT CGGTGACAGCATCTTT 1287 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 245 TAT GTT GT CAGAGT GT GGATT CCAC AT G GT GGC CT GTAACT CAT CGGT GACAGC ATCTTT 186 

Qy 1288 CATCAACCAATATACAGAT GACAAGAT CT GGTCAAGCT ACACT GAAT AT GT CTTCTACCG 1347 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I 

Db 185 CAT CAT C CAATAT ACAGAT GACAAGAT CT GGT CAAGCT ACACT GAAT AT GT CTT CTACCG 126 

Qy 1348 TGAG 1351 

I I I 

Db 125 TAAG 122 



RESULT 4 
CA463745 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



Craniata; Vertebrata; Euteleostomi; 
Sciurognathi; Muridae; Murinae; Mus . 



CA463745 784 bp mRNA linear EST 12-NOV-2002 

AGENCOURT_10724816 NIH_MGC_169 Mus musculus cDNA clone 
IMAGE: 6771233 5 1 , mRNA sequence. 
CA463745 

CA463745.1 GI : 24 920097 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
1 (bases 1 to 784) 
NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 
Unpublished 

Contact: Robert Strausberg, Ph.D. 

Email: cgapbs-r@mail.nih.gov 

Tissue Procurement: Dr. Jonathan Kuo, NIMH 
cDNA Library Preparation: Michael Browns tein Laboratory 
cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by: Agencourt Bioscience Corporation 
Clone distribution: MGC clone distribution information can be 

found through the I.M.A.G.E. Consortium/ LLNL at: 

http : //image . llnl . gov 

Plate: LLCM3090 row: h column: 16 

High quality sequence stop: 456. 
Location/ Qualifiers 
1. .784 

/organism="Mus musculus" 
/mol_type= ,, mRNA n 
/db_xref="taxon: 10090" 
/clone=" IMAGE: 6771233" 

/lab_host="DH10B (Tl-phage-resistant ) " 
/clone_lib="NIH_MGC_169" 

/note="Organ: Testicles; Vector: pDNR-LIB; Site_l: Sfil 



(ggccattatggcc) ; Site_2: Sfil (ggccgcctcggcc) ; cDNA made 
by oligo-dT priming and directionally cloned. 5* and 3 1 
adaptors were used in cloning as follows: 
5 1 -AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGG- 3 1 and 
5 1 -ATTCTAGAGGCCGAGGCGGCCGACATG-dT (30) NN-3 f . Full-length 
enriched library was constructed using the Clontech 
Creator SMART kit and size-selected to contain the 0.5 kb 
size fraction. Library created in thelaboratory of M. 
Brownstein (NIMH, NIH) . Note: this is a NIH_MGC Library." 

BASE COUNT 244 a 239 c 175 g 121 t 5 others 

ORIGIN 

Query Match 10.2%; Score 352.2; DB 14; Length 784; 

Best Local Similarity 85.9%; Pred. No. 8.5e-29; 

Matches 403; Conservative 0; Mismatches 63; Indels 3; Gaps 1; 



Qy 1347 GTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAATGGCAAAGGTG 1406 

I I I I I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I II I 
Db 104 GTGAGCCTTCCCGGTGGTCCTCCTCTCATTGTGACTGCTGCTGCAAGAATGGCAAGGGAG 163 

Qy 1407 ACAAAGAAGGGGAGAGC GGCAC GT CTT GCAAT GAC CT CT C C AC AT CT AGCT GCGAC AGC C 1466 

III I I I I I I I II I I I I I I II I I I I I I I I I I I Mill II I I I I I I I I I II I 
Db 164 ACA AAGGAGAGAGC GGCAC CT C CT GCAAT GAC CT GT CC ACT T C C AGCT GTGAC AGC C 220 

Qy 1467 AGT CT GAGGC C AGCT CT CC CCAGGAGACGGT C AT CT GT GGT CCC GT GAC AC GCC AGACC A 1526 

MM II I II M I I I I I II II I I M I I I II II I I I II I II II II M I II II II 

Db 221 AGT C AGAGGC C AGCT CT CC GCAGGAGACGGT GAT CT GTGGGC CT GT AAC GC GCC AGAGC A 280 

Qy 1527 ACAT C C AGACT CT GGACCGT C C CAT CAAGAAG GGC C CT GT C C AGCT GAT C CAAC AGT CAG 1586 

I I II I II II II I II M II I II I II I M M II II M I M I I I II II II M II II I 

Db 281 ACAT C C AGACT CT GGAT CGGC C CAT CAAGAAAGGT C C GGT GC AGCT GAT C CAAC AGT CAG 34 0 

Qy 1587 AGAT GC GGCGGAAAAGCGACT T ACT C C GGAT T CT GACTT CAGGCT C C AG GGAAT C GAAC A 164 6 

Mill II II M I M I III I I II II II II I II I II I I II II I M II I II II II I 
Db 341 AGAT GAGGC GGAAAAGT GACCT GCT CC GGACT CT GAC GT CAGGCT C C AGGGAGT C GAAC A 400 

Qy 1647 T GAGC AGCAAAAAAAAAGCT GT TAAAGAAAAGCT CT CAATT GAGGAGGAGCT GGAGAAAT 1706 

I M II I II I II I M I II II II II II M M II I II II II I II I I I I I M II I I 

Db 401 TAAGCAGCAAAAAGAAAGCTGCGAAGGAAAAGCT CT CCATCGAGGAAGAGCTGGAGAAAT 460 

Qy 1707 GT AT C C AGGAT T TCCTAAAAAAAAAAATT C CAGAT C GGTTT CCT GAGAGAAAAC AT CCT T 1766 

II I II I I M I II M I M I I I I II M I II I II M II II I II I I II I II II II I I 

Db 461 GT ATCCAGGATTTCTTGAAGATAAAAATT CCAGAT CGCTTCCCT GAGCGAAAACAT CCTT 520 

Qy 1767 GGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGG 1815 

II I I II I II II II I I I M I II II M I M I M II I I I I II 

Db 521 GGCAGTCTGAACTTTTACGGGAGTATCATCTATAGGGGGGAGGCTGTGG 569 



RESULT 5 
BU961910 
LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 



BU961910 778 bp mRNA linear EST 21-OCT-2002 

AGENCOURT_10617166 NIH_MGC_169 Mus musculus cDNA clone 
IMAGE: 6742567 5', mRNA sequence. 
BU961910 

BU961910. 1 GI: 24191482 
EST. 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



BASE COUNT 
ORIGIN 



Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 
1 (bases 1 to 778) 
NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 
Unpublished 

Contact: Robert Strausberg, Ph.D. 

Email: cgapbs-r@mail.nih.gov 

Tissue Procurement: Dr. Jonathan Kuo, NIMH 
cDNA Library Preparation: Michael Brownstein Laboratory 
cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by: Agencourt Bioscience Corporation 
Clone distribution: MGC clone distribution information can be 

found through the I.M.A.G.E. Consortium/LLNL at: 

http : //image . llnl . gov 

Plate: LLCM3080 row: n column: 06 

High quality sequence stop: 473. 
Location/ Qualifiers 
1. .778 

/organism="Mus musculus" 
/mol_type="mRNA" 
/db_xref="taxon: 10090" 
/clone="IMAGE: 6742567" 

/lab_host="DH10B (Tl-phage-resistant ) " 
/clone_lib="NIH_MGC_169" 

/note="Organ: Testicles; Vector: pDNR-LIB; Site_l: Sfil 
(ggccattatggcc) ; Site_2 : Sfil (ggccgcctcggcc) ; cDNA made 
by oligo-dT priming and directionally cloned. 5 1 and 3' 
adaptors were used in cloning as follows : 
5 ' -AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGG-3 1 and 
5 1 -ATTCTAGAGGCCGAGGCGGCCGACATG-dT ( 30 ) NN-3 1 . Full-length 
enriched library was constructed using the Clontech 
Creator SMART kit and size-selected to contain the 0.5 kb 
size fraction. Library created in thelaboratory of M. 
Brownstein (NIMH, NIH) . Note: this is a NIH_MGC Library." 
226 a 180 c 199 g 160 t 13 others 



Query Match 10.1%; 
Best Local Similarity 85.6%; 
Matches 4 04; Conservative 



Score 352; DB 13; Length 778; 
Pred. No. 8.9e-29; 
0; Mismatches 65; Indels 3; 



Gaps 



l; 



Qy 1347 GTGAGCCTTCCAGATGGTCACCCTCACACTGCGATTGCTGCTGCAAGAATGGCAAAGGTG 1406 

I I I I I I I I I I I I I II I I I I I I II II II I I I I I I I I I I I I I I I I I I I I II I 
Db 183 GTGAGCCTTCCCGGTGGTCCTCCTCTCATTGTGACTGCTGCTGCAAGAATGGCAAGGGAG 242 



Qy 

Db 
Qy 

Db 



1407 ACAAAGAAGG GGAGAGCGGCACGT CT T GCAAT GAC CT CT C C ACAT CT AGCT GC GAC AGC C 14 66 
III I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I II I I I I I I I 
243 ACA AAGGAGAGAGCGGCAC CT C CT GCAAT GAC CT GT C C ACTT CCAGCT GT GAC AGC C 299 

1467 AGT CT GAGGC C AGCT CTC CCCAGGAGACGGT CAT CT GT GGT C CC GT GACAC GC CAGACC A 1526 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II II I I I II I I II 
300 AGT CAGAGGC C AGCT CTC C GC AGGAGACG GT GAT CT GT GGGC CT GT AACGC GC C AGAGCA 359 



Qy 



1527 ACATCCAGACTCTGGACCGTCCCATCAAGAAGGGCCCTGTCCAGCTGATCCAACAGTCAG 1586 



Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 



I I I I I I I I I I I I I I I I II I I I I I I I I I I I II II II I I II I I I I I I I II I I I I I I 
360 AC AT CCAGACT CT GGAT CGGCCCAT CAAGAAAG GT C C GGT GC AGCTGAT C CAAC AGTC AG 419 

1587 AGATGCGGCGGAAAAGCGACTTACTCCGGATTCTGACTTCAGGCTCCAGGGAATCGAACA 1646 
I I I I I I I I I I I I I I I III I I I I I I I I I I I I II I I I I I I I || I I I I I I I I I I I I 
420 AG AT GAGGC GGAAAAGT GAC CT GCT C C GGACTCT GACGT C AGGCT CCAGGGAGT CGAACA 479 

1647 T GAGC AGCAAAAAAAAAGCT GTTAAAGAAAAGCT CT CAAT T GAGGAG GAGCT GGAGAAAT 1706 
I I I I I I I I I I I I I I I I I I I II II I I I I II I I I II Mill I I I I I I I I I I I I I 
480 TAAGCAGCAAAAAGAAAGCT GCGAAGGAAAAGCTCTCCAT CGAGGAAGAGCT GGAGAAAT 539 

1707 GTATCCAGGATTTCCTAAAAAAAAAAATTCCAGATCGGTTTCCTGAGAGAAAACATCCTT 1766 
I I I I I I I I I I I I I I I II I I I I I II I I I I II I I I II II II I I I I I I I I I I I I I 
540 GT AT CC AGGAT TT CT T GAAGATAAAAAT T CC AGAT C GCTT C C CT GAACGAAAACAT CCTT 599 

1767 GGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGGCGG 1818 
I I I I I I I II I I I I I I I I II I I II I I I I I II II III I I I I II 

600 GGCAGTCTGAACTTTTACGGAAGTATCATCTATAGGGGGGAGGGCTGTGGGG 651 



RESULT 6 
BY714867 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



Craniata; Vertebrata; Euteleostomi ; 
Sciurognathi; Muridae; Murinae; Mus , 



BY714867 952 bp mRNA linear EST 17-DEC-2002 

BY714867 RIKEN full-length enriched, adult male testis Mus musculus 
cDNA clone 4930434H12 5 f , mRNA sequence. 
BY714867 

BY714867.1 GI:27127984 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
1 (bases 1 to 952) 

Okazaki,Y., Furuno,M., Kasukawa,T. , Adachi,J., Bono,H., Kondo,S., 
Nikaido,I., Osato,N., Saito,R., Suzuki, H., Yamanaka,I., Kiyosawa,H. 
, Yagi,K., Tomaru,Y., Hasegawa,Y., Nogami,A., Schonbach, C . , 
Gojobori,T., Baldarelli, R. , Hill, D. P., Bult,C, Hume, D. A., 
Quackenbush, J. , Schriml, L.M. , Kanapin,A. , Matsuda,H. , Batalov, S . , 
Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V., Chothia,C, Corbani 
,L.E., Cousins, S., Dalla,E., Dragani, T . A. , Fletcher, C . F. , Forrest 
,A., Frazer,K.S., Gaasterland, T . , Gariboldi, M. , Gissi,C, Godzik,A. 
, Gough,J., Grimmond,S., Gustincich, S . , Hirokawa,N., Jackson, I . J. , 
Jarvis,E.D., Kanai,A., Kawaji,H., Kawasawa,Y., Kedzierski, R.M. , 
King,B.L., Konagaya,A. , Kurochkin, I . V. , Lee,Y., Lenhard,B., Lyons 
,P.A., Maglott, D. R. , Maltais,L., Marchionni , L . , McKenzie,L., Miki 
,H., Nagashima,T. , Numata,K., Okido,T., Pavan,W.J., Pertea,G., 
Pesole,G., Petrovsky, N . , Pillai,R., Pontius , J. U . , Qi, D. , 
Ramachandran, S. , Ravasi,T., Reed, J. C, Reed, D. J., Reid,J., Ring 
,B.Z., Ringwald,M., Sandelin,A. , Schneider , C . , Semple,C.A., Setou 
,M., Shimada,K., Sultana, R., Takenaka,Y., Taylor,M.S., Teasdale 
,R.D., Tomita,M., Verardo,R., Wagner, L., Wahlestedt, C . , Wang,Y., 
Watanabe,Y., Wells, C, Wilming, L. G. , Wynshaw-Boris, A. , Yanagisawa 
,M., Yang, I., Yang,L., Yuan,Z., Zavolan,M., Zhu,Y., Zimmer,A., 
Carninci,P., Hayatsu,N., Hirozane-Kishikawa, T . , Konno,H., Nakamura 
,M., Sakazume,N., Sato,K., Shiraki,T., Waki,K., Kawai,J., Aizawa,K. 
, Arakawa,T., Fukuda,S., Hara,A., Hashizume,W. , Imotani,K., Ishii 
,Y., Itoh,M., Kagawa,!., Miyazaki,A., Sakai,K., Sasaki, D., Shibata 



, K., Shinagawa, A. , Yasunishi, A. , Yoshino,M., Waterston, R. , Lander 

,E.S., Rogers, J., Birney,E. and Hayashizaki, Y. 
TITLE Analysis of the mouse trans criptome based on functional annotation 

of 60,770 full-length cDNAs 
JOURNAL Nature 420, 563-573 (2002) 
MEDLINE 22354683 
PUBMED 12466851 
COMMENT Contact: Yoshihide Hayashizaki 

Laboratory for Genome Exploration Research Group, RIKEN Genomic 

Sciences Center (GSC) , Yokohama Institute 

The Institute of Physical and Chemical Research (RIKEN) 

1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan 

Tel: 81-45-503-9222 

Fax: 81-45-503-9216 

Email : genome-res@gsc. riken. go. jp, 

URL : http : //genome . gsc . riken . go . jp/ 

Adachi,J., Aizawa,K., Akimura,T., Arakawa,T., Carninci,P., Fukuda 
,S., Hashizume, W. , Hayashida, K. , Hirozane,T., Hori,F., Imotani,K., 
Ishii,Y., Itoh,M., Kagawa,I., Kawai,J., Kojima,Y., Kondo,S., Konno 
, H., Koya,S., Miyazaki,A., Murata,M., Nakamura,M., Nomura, K. , 
Numazaki,R., Ohno,M., Ohsato,N., Saito,R., Sakazume,N., Sano,H., 
Sasaki, D., Sato,K., Shibata,K., Shiraki,T., Tagami,M., Takeda,Y., 
Waki,K., Watahiki,A., Muramatsu,M. and Hayashizaki, Y . Direct 
Submission 

Computational Analysis of Full-Length Mouse cDNAs Compared with 
Human Genome Sequences Mamm. Genome. 12, 67 3-677 (2001) 

Normalization and subtraction of cap-trapper-selected cDNAs to 
prepare full-length cDNA libraries for rapid discovery of new 
genes. Genome Res. 10 (10), 1617-1630 (2000) 

RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer. Genome Res. 
10 (11), 1757-1771 (2000) 

Computer-based methods for the mouse full-length cDNA 
encyclopedia: real-time sequence clustering for construction of a 
nonredundant cDNA library. Genome Res. 11 (2), 281-289 (2001) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site (http://genome.gsc.riken.go.jp) for 
further details. 
FEATURES Location/Qualifiers 
source 1. .952 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/db_xref="taxon: 10090" 

/clone="4930434H12" 

/sex="male" 

/tissue_type=" testis" 

/dev_stage="adult" 

/lab_host="SOLR" 

/clone_lib=" RIKEN full-length enriched, adult male testis" 
/note="Site_l: Xhol ; Site_2 : BamHI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 



BASE COUNT 
ORIGIN 



Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 1 

GAGAGAGAGAAGGAT CCAAGAGCT CTTTT T TTT T TT T TT TT VN 3 1 ], cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. Second strand cDNA was prepared with the 
primer adapter of sequence [5 1 

GAGAGAGAGAGCGGCCGCAATTAATTCTCGAGTTAATTAAATTAATCCCCCCCCCCC 
3*]. cDNA was cloned into the Xhol and BamHI sites. " 
233 a 259 c 226 g 229 t 5 others 



Query Match 10.1%; Score 350.6; DB 14; Length 952; 

Best Local Similarity 75.5%; Pred. No. l.le-28; 

Matches 506; Conservative 0; Mismatches 142; Indels 22; Gaps 



5; 



Qy 



Db 



1347 GT GAGC CT T C C AGAT GGT C AC C CT C AC ACT G C GAT T GCT GCTGCAAGAAT GG C AAAGGT G 14 06 
I I I I I I I I I I I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I II I 
167 GTGAGCCTTCCCGGTGGTCCTCCTCTCATTGTGACTGCTGCTGCAAGAATGGCAAGGGAG 226 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 
Qy 

Db 

Qy 

Db 

Qy 

Db 



14 07 ACAAAGAAGGGGAGAGC G GCAC GT CTT GCAAT GAC CTCT C CAC ATCT AGCT GC GACAGCC 14 66 
III I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
227 ACA AAGGAGAGAGC GGCACCT C CT GCAAT GAC CTGT C CACT TC C AGCT GT GACAGCC 283 

1467 AGT CT GAGGC C AGCT CT CC C C AGGAGACGGT C AT CT GT GGT CC C GT GAC AC GCCAGACCA 1526 
I I I I I II I I I II I I I I I I I I I I II I I I I I I I II I I I I II II II I I I I I I I II 
AGT C AGAGGCC AG CT CT C C GC AGGAGAC GGTGAT CT GT GGGCCT GTAAC GC GC CAGAGCA 



284 



343 



1527 ACAT C CAGACT CT GGAC C GT C C CAT CAAGAAGGGC CCT GT C CAGCT GAT C CAACAGT C AG 1586 
I I I I I I I I I I I I I I I I II I I I I I I I I I I I II II II I I I I I I I I I I II I I I I I I I 
ACAT C CAGACT CT GGAT C GGC C CAT CAAGAAAGGT CCGGT GC AGCT GAT C CAACAGT C AG 



344 



403 



1587 AGAT GCGGCG GAAAAGC GACT T ACT C CGGATT CT GACTT C AGGCTC C AGGGAAT CGAAC A 1646 
I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
AGAT GAGGCGGAAAAGT GAC CT GCT CCGGACT CT GAC GT C AGGCTC C CGGGAGT CGAACA 



404 



463 



1647 T GAGCAGCAAAAAAAAAGCT GT TAAAGAAAAGCTCT CAATT GAGGAGGAGCT GGAGAAAT 1706 
I I I I I I I II I I I I I I I I II II I I I I I I I I I I I II II II I I I I I I I I I I I II 
TAAGC AGCAAAAAGAAAGCT GC GAAGGAAAAGCT CT C CAT C GAC GAAGAGCT GGAGAAAT 523 



464 



1707 GT AT C C AGGAT TT C CT AAAAAAAAAAATT C CAGAT C GGTTT CCT GAGAGAAAACAT C CT T 1766 
I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I II I I I I I I II I I I I I I I I 
GT AT C C AGGAT TT CTT GAAGATAAAAATT C CAGAT C GCTT C CCT GAGCGAT C ACAT C CT T 



524 



583 



1767 GGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGGCGGGGAAAAAA 1826 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I III III I 

584 GGCAGTCTGAACTCTTACGGGAGTATCATCTATAGGGGGAGGCGTGCGGGCG 635 

1827 AAAAAAAAG AGT CAT TTT GAAAT T AAC CT C AT AAAAG G AAT T CAT AT T T T AAAG G AAAAA 1886 
I I I I I I I III I I I I I I I I I I I I I I I I I II 

636 CT C GC GCT CT CAAAT ACAC CCN CT CAAAGGGC GAC AT AT C CT AAAC G- GAAA 686 

1887 AAT ACAACTAAT GAT GCAC AT TTCT T AGAACACAAT AGT C CAT T GAT AT ACT ACT GC CT A 1946 
III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

687 AGT ATCACTAATCGNTCACATCTGT CACAGCACACACGT - CAT CTAT GTACTATCTCCTA 745 



Qy 

Db 



1947 CT T T ACCTAGT T C AC CTTAACAT GTAAAT C CACAGGGT AGATTT CTT T CT AGAT GT G GAA 2006 
I II I II I I I I I I I II I I III II I I I I I I I I II I 

74 6 CTTTGNCTA-CCCTCCTTAACGTGCCACTCACAGGGCACACTTTTTTTATATGTGGATCA 804 



Qy 2007 GTACAAGAAA 2016 

I I I I I I 
Db 805 CTATAATATA 814 



RESULT 7 

BF391086/C 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



FEATURES 

source 



BF391086 489 bp mRNA linear EST 27-NOV-2000 

UI-R-CAl-bcd-a-05-0-UI . si UI-R-CA1 Rattus norvegicus cDNA clone 
UI-R-CAl-bcd-a-05-0-UI 3', mRNA sequence. 
BF391086 

BF391086.1 GI: 11375933 
EST. 

Rattus norvegicus (Norway rat) 
Rattus norvegicus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; 
Rattus . 

1 (bases 1 to 489) 

Bonaldo,M. F. , Lennon,G. and Scares, M.B. 

Normalization and subtraction: two approaches to facilitate gene 
discovery 

Genome Res. 6 (9), 791-806 (1996) 

97044477 

8889548 

Contact: Soares, MB 

Coordinated Laboratory for Computational Genomics 
University of Iowa 

375 Newton Road , 4156 MEBRF, Iowa City, IA 52242, USA 

Tel: 319 335 8250 

Fax: 319 335 9565 

Email: bento-soares@uiowa.edu 

The sequence contained an oligo-dT track that was present in the 
oligonucleotide that was used to prime the synthesis of first 
strand cDNA and therefore this may represent a bonafide poly A 
tail . The sequence tag present in the cDNA between the NotI site 
and the oligo-dT track served to identify it as a clone from the 
normalized testis library cDNA Library Preparation: M.B. Soares Lab 
Clone distribution: clones will be available through Research 
Genetics (www.resgen.com) The following repetitive elements were 
found in this cDNA sequence: 1-35, >POLY_A#Simple_repeat 
Seq primer: M13 Forward 
P0LYA=Yes . 

Location/Qualifiers 

1. .489 

/organism="Rattus norvegicus" 
/mol_type="mRNA" 
/strain="Sprague-Dawley" 
/db_xref="taxon: 10116" 
/clone="UI-R-CAl-bcd-a-05-0-UI" 
/lab_host="DH10B (Life Technologies)" 
/clone lib="UI-R-CAl" 



/note="Vector : pT7T3D-Pac (Pharmacia) with a modified 
polylinker; Site_l: Not I; Site_2 : Eco RI; The UI-R-CA1 
library is a subtracted library derived from the following 
tissues: thalamus , cerebellum, hypothalamus, medulla, pons 
, midbrain, cerebral cortex, corpus striatum, testis, and 
hippocampus. For a detailed description of the library 
from which this clone was derived, please visit our web 
site at ratest.eng.uiowa.edu. The subtraction has been 
previously described in (Bonaldo, Lennon and Soares, 
Genome Research 6:791-806, 1996) 
TAG_LI B=UI - R-CA1 
T AG_T ISSUE= testis 
TAG_S EQ=ACGCAG " 

BASE COUNT 89 a 123 c 111 g 164 t 2 others 

ORIGIN 

Query Match 9.8%; Score 338.6; DB 10; Length 489; 

Best Local Similarity 84.0%; Pred. No. 3.1e-27; 

Matches 4 00; Conservative 0; Mismatches 60; Indels 16; Gaps 1; 



Qy 1413 AAGGGGAGAGCGGCACGTCTTGCAATGACCTCTCCACATCTAGCTGCGACAGCCAGTCTG 1472 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I 
Db 483 AAGGGGAGAGTGGCACTTCCTGCAATGACCTCTCTACTTCCAGCTGCGACAGCCAGTCAG 424 

Qy 1473 AGGCC AGCT CT CC CC AGGAGAC GGT CAT CTGTGGT C CCGT GACAC GC C AGAC CAACAT C C 1532 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II II II II III I I I I I I II 
Db 423 AGGCCAGCTCTCCCCAGGAGACAGTGATCTGTGGGCCTGTAACGCGTCAGGGCAACATCC 364 

Qy 1533 AGACT CT GGAC CGT C C CAT CAAGAAGGGCCCTGT C CAGCT GAT C CAAC AGT C AGAGAT GC 1592 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I 
Db 363 AGACTCTGGAC CGGCCCAT CAAGAAAGGCCCCGTGCAGCTGAT CCAACAGT CAGAGAT GA 304 

Qy 1593 GGCGGAAAAGCGACTTACTCCGGATTCTGACTTCAGGCTCCAGGGAATCGAACATGAGCA 1652 

I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 303 GGCGGAAAAGTGACCTGCTCCGGACTCTGACTTCCGGCTCTAGGGAGTCGAACATNAGCA 244 

Qy 1653 GCAAAAAAAAAGCT GTTAAAGAAAAGCT CT CAATT GAGGAGGAGCT GGAGAAAT GT AT C C 1712 

I I I I I I I I I I I I I I II I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
Db 243 GCAAAAAGAAAGCT GCGAAGGAAAAGGT CT C CAT CGAGGAAGAGCT GGAGAAAT GT AT CC 184 

Qy 1713 AGGATTTC CT AAAAAAAAAAAT T CCAGAT CGGT T T C CT GAGAGAAAACAT CCT T GGCAAT 1772 

I I I I I I I I I I II I I I I I I I I I I I I I II I II I I I I I I I I II I I I I I I I II I I I I I 
Db 183 AGGATTTCCT GAAGATAAAAATTCCAGAT CGCTT CCCT GAGAGAAAACAT CCTTGGCAGT 124 

Qy 1773 CTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGGCGGGGAAAAAAAAAAAA 1832 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I I 
Db 123 CT GAACTT T T AC GGAAGT AT C AT CT AT AGGG GAGGG C AGT GGGT AGT CA 75 

Qy 1833 AAGAGT CAT T T T GAAAT TAAC CT C ATAAAAGGAAT T C AT ATTT TAAAGGAAAAAAA 1888 

II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 74 C CACT T T GAAAT AAACCT CCT GAAAGGAAGACAT AT AT TAAAGGAAAAAT A 24 



RESULT 8 
AK043351 

LOCUS AK043351 2332 bp mRNA linear HTC 05-DEC-2002 

DEFINITION Mus musculus 7 days neonate cerebellum cDNA, RIKEN full-length 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 



enriched library, clone :A730087N02 product : hypothetical protein, 

full insert sequence. 

AK043351 

AK043351. 1 GI : 26335652 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 

Carninci,P. and Hayashizaki, Y. 

High-efficiency full-length cDNA cloning 

Meth. Enzymol. 303, 19-44 (1999) 

99279253 

10349636 

2 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 

Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new genes 

Genome Res. 10 (10), 1617-1630 (2000) 

20499374 

11042159 

3 

Shibata,K., Itoh,M., Aizawa,K., Nagaoka,S., Sasaki, N., Carninci,P., 
Konno,H., Akiyama,J., Nishi,K., Kitsunai,T., Tashiro,H., Itoh,M., 
Sumi,N., Ishii,Y., Nakamura,S., Hazama,M., Nishine,T., Harada,A. , 
Yamamoto,R., Matsumoto, H . , Sakaguchi, S . , Ikegami,T., Kashiwagi, K. , 
Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., Watahiki,M., 
Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai,J., 
Okazaki,Y., Muramatsu,M. , Inoue,Y., Kira,A. and Hayashizaki, Y. 
RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
20530913 
11076861 
4 

Kawai,J., Shinagawa, A. , Shibata,K., Yoshino,M., Itoh,M., Ishii,Y., 
Arakawa,T., Hara,A. , Fukunishi, Y. , Konno,H., Adachi,J., Fukuda,S., 
Aizawa,K., Izawa,M., Nishi,K., Kiyosawa,H., Kondo,S., Yamanaka,I., 
Saito,T., Okazaki,Y., Gojobori,T., Bono,H., Kasukawa,T., Saito,R., 
Kadota,K., Matsuda,H., Ashburner ,M. , Batalov, S . , Casavant,T., 
Fleischmann, W. , Gaasterland, T . , Gissi,C, King,B., Kochiwa,H., 
Kuehl,P., Lewis, S., Matsuo,Y., Nikaido,I., Pesole,G., 
Quackenbush, J. , Schriml, L. M. , Staubli,F., Suzuki, R. , Tomita,M., 
Wagner, L. , Washio,T., Sakai,K., Okido,T., Furuno,M., Aono,H., 
Baldarelli, R. , Barsh,G., Blake, J., Boffelli,D., Bojunga,N., 
Carninci,P., de Bonaldo, M. F. , Brownstein,M. J . , Bult,C, 
Fletcher, C, Fujita,M., Gariboldi , M. , Gustincich, S . , Hill,D., 
Hofmann,M., Hume, D. A., Kamiya,M., Lee, N. H. , Lyons, P., 
Marchionni, L. , Mashima,J., Mazzarelli, J. , Mombaerts, P. , Nordone,P., 
Ring,B., Ringwald,M., Rodriguez, I . , Sakamoto, N., Sasaki, H., 
Sato,K., Schonbach,C. , Seya,T., Shibata,Y., Storch,K.F., Suzuki, H., 
Toyo-oka,K., Wang,K.H., Weitz,C, Whittaker , C . , Wilming,L., 
Wynshaw-Boris, A. , Yoshida,K., Hasegawa,Y., Kawaji,H., Kohtsuki,S. 
and Hayashizaki, Y. 

Functional annotation of a full-length mouse cDNA collection 



JOURNAL Nature 409 (6821), 685-690 (2001) 
MEDLINE 21085660 
PUBMED 11217851 
REFERENCE 5 

AUTHORS The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

TITLE Analysis of the mouse transcriptome based on functional annotation 

of 60,770 full-length cDNAs 

JOURNAL Nature 420, 563-573 (2002) 
REFERENCE 6 (bases 1 to 2332) 

AUTHORS Adachi,J., Aizawa,K., Akimura,T., Arakawa,T., Bono,H., Carninci f P., 
Fukuda,S., Furuno,M. , Hanagaki,T., Hara,A., Hashizume, W. , 
Hayashida,K. , Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., 
Hori,F., Imotani,K., Ishii,Y., Itoh,M., Kagawa,I., Kasukawa,T., 
Katoh,H., Kawai,J., Kojima,Y., Kondo,S., Konno,H., Kouda,M., 
Koya,S., Kurihara,C, Matsuyama, T . , Miyazaki,A. , Murata,M., 
Nakamura,M., Nishi,K., Nomura, K., Numazaki,R., Ohno,M., Ohsato,N., 
Okazaki,Y., Saito,R., Saitoh, H., Sakai,C, Sakai,K., Sakazume,N., 
Sano,H., Sasaki, D., Shibata,K., Shinagawa, A. , Shiraki,T., 
Sogabe,Y., Tagami,M., Tagawa,A., Takahashi , F. , Takaku-Akahira, S . , 
Takeda,Y., Tanaka,T., Tomaru,A. , Toya,T., Yasunishi, A. , 
Muramatsu,M. and Hayashizaki, Y. 

TITLE Direct Submission 

JOURNAL Submitted ( 16- JUL-2001 ) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN) , Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan ( E-mail : genome-res@gsc . riken . go . jp, 
URL :http: //genome. gsc. riken. go. jp/, Tel: 81-45-503-9222, 
Fax:81-45-503-9216) 
COMMENT cDNA library was prepared, and sequenced in Mouse Genome 

Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site for further details. 
URL: http : //genome . gsc . riken. go . jp/ 
URL:http: //fantom. gsc. riken. go. jp/ . 
FEATURES Location/Qualifiers 
source 1. .2332 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/ db_xref ="FANTOM_DB : A730087N02 " 

/db_xref="taxon: 10090" 

/ clone= " A7 3 0 0 8 7N 02 " 

/tissue_type=" cerebellum" 

/clone_lib="RIKEN full-length enriched mouse cDNA library" 
/dev__stage= ,f 7 days neonate" 
CDS 379. .1809 

/note="unnamed protein product; hypothetical protein 
(evidence: rsCDS, ProCrest, decoder, Longest-ORF) 
putative" 
/codon_start=l 
/protein_id="BAC31527 . 1" 
/db_xref ="GI : 26335653" 

/ trans la tion="MALKDTGSGGSTILPISEMVSASSSPGAPLAAAPGPCAPSPFPE 



WELNVGGQVYVTKHSTLLSVPDSTLASMFSPSSPRGGARRRGDLPRDSRARFFIDRD 
GFLFRYVLDYLRDKQLALPEHFPEKERLLREAEFFQLTDLVT^LLSPKVTKQNSLNDEC 
CQSDLEDNVSQGSSDALLLRGAAAGAPSGSGAHGVSGWGGGSAPDKRSGFLTLGYRG 
SYTTVRDNQADAKFRRVARIMVCGRIALAKEVFGDTLNESRDPDRQPEKYTSRFYLKF 
TYLEQAFDRLSEAGFHMVACNSSGTAAFVNQYRDDKIWSSYTEYIFFRPPQKIVSPKQ 
EHEDRKRDKVTDKGSESGTSCNELSTSSCDSHSEASTPQDNPANTQQAAAHQPNTLTL 
DRPSRKAPVQWMPPPDKRRKSELFQSLISKSRETNLSKKKVCEKLSVEEEMKKCIQDF 
KKIHI PDCFPERKRQWQSELLQKYGL" 

BASE COUNT 556 a 647 c 620 g 509 t 

ORIGIN 

Query Match 9.5%; Score 329; DB 11; Length 2332; 

Best Local Similarity 58.5%; Pred. No. 1.3e-26; 

Matches 781; Conservative 0; Mismatches 455; Indels 99; Gaps 8; 

Qy 560 GGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTT 619 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 481 GGGCCCTGCGCCCCGTCGCCCTTCCCCGAGGTAGTGGAGCTGAATGTTGGCGGCCAGGTT 540 

Qy 620 TATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCCTCCTGTGGAAAATGTTT 679 

I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I 

Db 541 TAT GT GAC CAAGC AT T C GAC GTTACT C AGC GT C CCGGACAGCACT CT GGC C AGCAT GT T C 600 

Qy 68 0 T C C C C AAAG AG AG AC AC G G C T AAT GAT C TAGCCAAGGACT CCAAG 724 

I I I I II III II I I I I I I I I I 

Db 601 TCACCCTCTAGTCCCCGGGGCGGCGCTAGGCGCCGGGGCGACTTGCCCAGGGACAGCCGC 660 

Qy 725 GGAAGGT T TT TCAT T GAC AGAGAT GGATT CT T GTT C CGTT AT AT T CT GGACT ATCT C AGG 784 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 661 GCGCGCTTCTTCATCGACCGCGACGGCTTCCTCTTTAGGTACGTGCTGGATTACCTGCGC 720 

Qy 785 GACAGGCAGGTGGT CCT GCCT GATCACTTTCCAGAAAAAGGAAGACTGAAAAGGGAAGCT 844 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 

Db 721 GACAAGCAGCTGGCACTGCCCGAGCACTTTCCCGAGAAGGAGAGGCTCCTGCGCGAAGCA 780 

Qy 84 5 GAATACTTCCAGCT CCCAGACTTGGTCAAACT CCTGACCCCCGAT GAAAT CAAGCAAAGC 904 

II I III I I II I I I III II I I II I II III I III I I I I I I II I I I 

Db 781 GAGTTCTTTCAGCTCACCGACCTGGTCAAGCTGCTGTCGCCCAAGGTCACCAAGCAGAAC 840 

Qy 905 CC AGATGAATTCT GCCACAGTGACTTT GAAGA 936 

I I I I I I I I I II I I I I I I I I I I I I 

Db 841 T CGCT CAACGAT GAGTGCT GCCAGAGCGACCTGGAGGACAACGTTT CCCAGGGCAGCAGC 900 

Qy 937 T GC CT C C C AAGGAAGCGACACAAGAAT CT GCC C C CCTT C CT 977 

III I I I I I I I I I I III I 

Db 901 GACGCACTGCTGCTGCGTGGGGCGGCGGCTGGCGCGCCCTCGGGCTCTGGGGCACATGGT 960 

Qy 978 CCCTG CTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTG 1018 

II I I I I I I I I I I I I I I I I I I 

Db 961 GTCAGTGGTGTAGTCGGTGGTGGCAGCGCTCCGGACAAGCGCTCTGGGTTCCTCACCCTG 1020 

Qy 1019 GGT T AC AGAGGAT C CT GC ACCT TGGGC AGAGAGGGAC AGGCAGAT GCCAAGT T T C GGAGA 1078 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I III 

Db 1021 GGCTACCGTGGCTCTTACACCACGGTGCGAGATAACCAGGCAGATGCCAAGTTCAGGCGT 1080 



Qy 1079 



GTTCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACT 1138 
II I I I I I I I I I I I I I I III I I I I I I I I I I I I I II I I I I I I I I I 



Db 



1081 GTGGCGCGCATCATGGTGTGTGGGCGCATAGCCTTGGCCAAGGAGGTCTTTGGGGACACT 1140 



Qy 1139 T T GAAT GAAAGC AGAGAC C CT GAT C GAGCC C CAGAAAGAT AC ACCT C C AGATT T TAT CT C 1198 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1141 CTT AAT GAGAGT C GCGAC C CT GAC C GT C AGC CT GAGAAGT AC ACATC C CG CTT CT ACCT C 1200 

Qy 1199 AAAT TCAAGC AC CT GGAAAGGGCT T T T GAT AT GT T GT C AGAGTGT GGATT C C AC AT GGT G 1258 

II I I I I I I I I I I I I I I I I I I I I I I 111 I I I I I I I I I I I I I I 

Db 1201 AAGT TCACCT ACT T G GAGC AGGCGT T C GAT C GACT GT CT GAGGC C GGCTT C C AC AT GGT G 1260 

Qy 1259 GCCT GTAACTCAT CGGTGACAGCATCTTTCATCAACCAATAT ACAGAT GACAAGAT CTGG 1318 

II II I I I II II I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1261 GCGT GCAACT C CT CT GGC ACT GCC GC CT TT GT CAAC CAGT AC CGAGAC GACAAGAT CTGG 1320 

Qy 1319 T CAAGCT AC ACT GAAT AT GTCT T CT AC C GT GAGC CT T CC AGATGGT CACC CT CACAC 1375 

I I I I I I I I I I I I I I I I I I I I I I III II I I I I I I I II 

Db 1321 AGC AGT T AC ACT GAAT AC ATCT T CTTCC GAC CAC CT CAGAAAAT AGT GTCAC C CAAGCAA 1380 

Qy 1376 T GC GAT T GCT GCT GCAAGAAT GGCAAAG GT GACAAAGAAGGGGAGAG C GGC AC GTCT 1432 

I I I I I I I I I I I I I II I II I I I I I I I I I I I I II 

Db 1381 GAACAT GAAGACAGGAAACGCGACAAAGT CACAGACAAAGGAAGT GAGAGTGGGACTTCC 144 0 

Qy 1433 TGCAATGACCTCTCCACATCTAGCTGCGACAGCCAGTCTGAGGCCAGCTCTCCCCAGGAG 14 92 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 1441 T GCAATGAGCTCTCCACATCCAGCT GT GACAGCCACT CAGAGGCCAGCACTCCACAGGAC 1500 

Qy 1493 A CG GT C AT CT GT GGT C CC GT GAC AC GCCAGAC CAAC AT C CAGACT CT GGAC CGT 1546 

I I II II I I I I II I I I I I I I I I I I I I 

Db 1501 AAC C CAGCCAAC ACT C AGCAGGCT GC AGCT CAC CAGC CT AACAC CTT AAC CTT GGAT AGA 1560 

Qy 1547 CCCATCAAGAAGGGCCCTGTCCAGCTGATCCAACAGTCAGAGATGCGGCGGAAAAGCGAC 1606 

III I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I 

Db 1561 C C CT C CAGGAAAGCAC CT GTT CAGT GGATGC C AC C ACCAGACAAGC GC AGAAAAAGT GAA 1620 

Qy 1607 TT ACT C CGGATT CTGACTT C AGGCT CCAGGGAAT C GAACAT GAGC AGCAAAAAAAAAGCT 1666 

I I I I II I I I I I I I I I I I I I I I I I I 

Db 1621 CT CTT T CAGT CACTC AT C AGCAAGT CC CGAGAAACAAAT CT CT C CAAAAAGAAG 1674 

Qy 1667 GTTAAAGAAAAGCT CT CAATT GAGGAGGAGCT GGAGAAAT GT AT CCAGGAT TT C CT AAAA 1726 

II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1675 GT CT GT GAGAAGCT AAGT GT AGAAGAAGAAAT GAAAAAGT GT ATT CAGGATTTT AAAAAA 1734 

Qy 1727 AAAAAAATT CCAGAT CGGTTT CCTGAGAGAAAACATCCTT GGCAATCT GAACTTTTAAGG 1786 

I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I 

Db 1735 AT C CAT ATT C CAGAT T GT T TT C CAGAGC GCAAAC GC C AGT GGCAAT CT GAACT C CT C CAA 1794 

Qy 1787 AAGT AT CAT CT AT AA 1801 

II I I I I I I I 

Db 1795 AAATATGGGTTGTAA 1809 



RESULT 9 
AK047519 

LOCUS AK047519 2343 bp mRNA linear HTC 05-DEC-2002 

DEFINITION Mus musculus 10 days neonate cerebellum cDNA, RIKEN full-length 

enriched library, clone : B930082J01 product : hypothetical protein, 

full insert sequence. 
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Carninci, P . , 

Itoh,M. , 
Harada, A. , 



TITLE 

JOURNAL 

MEDLINE 



AK047519 

AK047519.1 GI: 26092232 
HTC; CAP trapper. 
Mus mus cuius (house mouse) 
Mus mus cuius 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 

Carninci,P. and Hayashizaki, Y. 

High-efficiency full-length cDNA cloning 

Meth. Enzymol. 303, 19-44 (1999) 

99279253 

10349636 

2 

Carninci, P. , Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 

Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new genes 

Genome Res. 10 (10), 1617-1630 (2000) 

20499374 

11042159 

3 

Shibata,K., Itoh,M., Aizawa,K., Nagaoka,S., Sasaki, N. 
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Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai, J., 
Okazaki,Y., Muramatsu,M. , Inoue,Y., Kira,A. and Hayashizaki, Y . 
RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
20530913 
11076861 
4 

Kawai, J. 
Arakawa, T. 

Aizawa,K., Izawa,M., Nishi,K., Kiyosawa,H., Kondo,S., Yamanaka,I., 
Saito,T., Okazaki,Y., Gojobori,T., Bono,H., Kasukawa,T., Saito,R., 
Kadota,K., Matsuda,H., Ashburner, M. , Batalov,S., Casavant,T., 
Fleischmann, W. , Gaasterland, T . , Gissi,C. f King,B., Kochiwa,H., 
Kuehl,P., Lewis, S., Matsuo,Y., Nikaido,I., Pesole,G., 
Quackenbush, J. , Schriml, L.M. , Staubli,F., Suzuki, R. , Tomita,M., 
Wagner, L., Washio,T., Sakai,K., Okido,T., Furuno,M., Aono,H., 
Baldarelli, R. , Barsh,G., Blake, J., Boffelli,D., Bojunga,N., 
Carninci, P., de Bonaldo,M. F. , Brownstein, M. J. , Bult,C, 
Fletcher, C, Fujita,M., Gariboldi, M. , Gustincich, S . , Hill,D., 
Hofmann,M., Hume, D. A., Kamiya,M., Lee, N . H . , Lyons, P., 
Marchionni, L. , Mashima,J., Mazzarelli , J. , Mombaerts, P . , Nordone,P., 
Ring,B., Ringwald,M., Rodriguez, I . , Sakamoto, N., Sasaki, H., 
Sato,K., Schonbach, C. , Seya,T., Shibata,Y., Storch,K.F., Suzuki, H., 
Toyo-oka,K., Wang,K.H., Weitz,C, Whittaker , C . , Wilming,L., 
Wynshaw- Boris, A. , Yoshida,K., Hasegawa,Y., Kawaji,H., Kohtsuki,S. 
and Hayashizaki, Y. 

Functional annotation of a full-length mouse cDNA collection 

Nature 409 (6821), 685-690 (2001) 

21085660 



Shinagawa, A. , Shibata,K., Yoshino,M., Itoh,M. 
Hara,A., Fukunishi, Y. , Konno,H., Adachi,J., 



Ishii, Y. 
Fukuda, S . 



PUBMED 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



misc feature 



BASE COUNT 
ORIGIN 



11217851 
5 

The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 
6 (bases 1 to 2343) 

Adachi,J., Aizawa,K., Akimura,T., Arakawa,T., Bono,H., Carninci,P., 
Fukuda,S., Furuno,M., Hanagaki,T., Hara,A., Hashizume, W . , 
Hayashida, K . , Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., 
Hori,F., Imotani,K., Ishii,Y., Itoh,M., Kagawa,I., Kasukawa,T., 
Katoh,H., Kawai,J., Kojima,Y., Kondo,S., Konno,H., Kouda,M., 
Koya,S., Kurihara,C, Matsuyama, T . , Miyazaki,A., Murata,M., 
Nakamura,M., Nishi,K., Nomura, K., Numazaki,R., Ohno,M., Ohsato,N., 
Okazaki,Y., Saito,R., Saitoh, H., Sakai,C, Sakai,K., Sakazume,N., 
Sano,H., Sasaki, D., Shibata,K., Shinagawa, A. , Shiraki,T., 
Sogabe,Y., Tagami,M., Tagawa,A., Takahashi, F. , Takaku-Akahira, S . , 
Takeda,Y., Tanaka,T., Tomaru,A., Toya,T., Yasunishi, A. , 
Muramatsu,M. and Hayashizaki, Y. 
Direct Submission 

Submitted ( 16-JUL-2001) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN) , Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome-res@gsc . riken . go . jp, 
URL: http : //genome . gsc . riken . go . jp/ , Tel : 8 1-45-503-9222 , 
Fax:81-45-503-9216) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site for further details. 
URL : http : / / genome . gsc . ri ken . go . j p/ 
URL: http : // fantom. gsc. riken. go . jp/ . 

Location/Qualifiers 

1. .2343 

/organism="Mus musculus" 

/mo l_t yp e= "mRNA" 

/strain="C57BL/6J" 

/ db_xr e f = " FANTOMJDB : B 9 3 0 0 8 2 JO 1 " 

/db_xref="taxon: 10090" 

/clone="B930082J01" 

/tissue_type=" cerebellum" 

/clone_lib=" RIKEN full-length enriched mouse cDNA library" 
/dev_stage="10 days neonate" 
1. .2343 

/note="hypothetical protein (evidence: 
rsCDS, ProCrest, decoder, Longest-ORF) " 
554 a 660 c 624 g 505 t 



Query Match 9.4%; Score 327.4; DB 11; Length 2343; 

Best Local Similarity 58.4%; Pred. No. 2e-26; 

Matches 780; Conservative 0; Mismatches 456; Indels 99; Gaps 8; 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 



560 



512 



620 



GGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTT 
I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 

GGGCCCTGCGCCCCGTCGCCCTTCCCCGAGGTAGTGGAGCTGAATGTTGGCGGCCAGGTT 



T ATT TT ACT C GCCAT T CC ACAT T GAT AAGC ATC C CT CATTC CCT C CT GT GGAAAAT GTTT 
I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I 

572 TAT GTGACCAAGC AT T CGACGT TACT C AGC GT C C C GGAC AGC ACT CT GGC CAGC AT GTT C 

680 T C C C CAAAGAGAGAC AC GG CT AAT GAT C TAGCCAAGGACTC CAAG 

I I I I II III II I I I I I I I I I 

632 TCACCCTCTAGTCCCCGGGGCGGCGCTAGGCGCCGGGGCGACTTGCCCAGGGACAGCCGC 

725 GGAAGGT TT T T C ATT GAC AGAGAT GGATT CT T GT T C CGTTAT ATT CT GGACT AT CT C AGG 

I I I I I I I I I I I I I I I I I I I I I I I III I I I I II I I II I 
692 GCGCGCTTCTTCATCGACCGCGACGGCTTCCTCTTTAGGTACGTGCTGGATTACCTGCGC 

785 GAC AGGCAGGT GGTC CT GC CT GAT C ACTT T CC AGAAAAAGGAAGACT GAAAAGGGAAGCT 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
752 GACAAGCAGCTGGCACTGCCCGAGCACTTTCCCGAGAAGGAGAGGCTCCTGCGCGAAGCA 

845 GAAT ACT T C CAGCT CC CAGACTT GGT CAAACT C CT GAC C CCC GAT GAAAT CAAGCAAAGC 

II I III I I I I I I I III I I I I I I I II III I III I I I I I I I I I I I 
8 12 GAGTTCTTTCAGCTCACCGACCTGGTCAAGCTGCTGTCGCCCAAGGTCACCAAGCAGAAC 



619 



571 



679 



631 



724 



691 



784 



751 



844 



811 



904 



871 



905 CC AGATGAATTCT GCCACAGT GACTTT GAAGA 936 

I I I I I I I I I I I I I I I I I I I I I I I 

872 T C GCT CAAC GAT GAGT GCT GC C AGAGC GAC CT GGAGGACAAC GTT TC C CAGGGC AGC AGC 931 

937 TGCCTCCCAAGGAAGCGACACAAGAATCTGCCCCCCTTCCT 977 

III I I I I I I I I I I I I I I 

932 GACGCACTGCTGCTGCGTGGGGCGGCGGCTGGCGCGCCCTCGGGCTCTGGGGCACATGGT 991 

978 CCCTG CTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTG 1018 

II I I I I I I I I I I I I I I I I I I 

992 GTCAGTGGTGTAGTCGGTGGTGGCAGCGCTCCGGACAAGCGCTCTGGGTTCCTCACCCTG 1051 

1019 GGT T ACAGAGGAT CCT GC ACCT TGGGC AGAGAGGGACAGGC AGAT GC CAAGTT T CGGAGA 1078 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I II I 

1052 GGCT ACC GT GGCT CTT AC ACCACGGT GC GAGATAAC CAGGC AGAT GC CAAGTT CAGGCGT 1111 

1079 GTTCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACT 1138 

II I I I I I I I I I I I I I I III I I I I I II I I I I I I I I I I I I I I I I I 
1112 GTGGCGCGCATCATGGTGTGTGGGCGCATAGCCTTGGCCAAGGAGGTCTTTGGGGACACT 1171 

1139 TTGAATGAAAGCAGAGACCCT GATCGAGCCCCAGAAAGAT ACACCTCCAGATTTT ATCT C 1198 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1172 CT T AAT GAGAGT C GC GAC C CT GACCGT CAGC CT GAGAAGT AC AC ATC C CGCTT CT AC CT C 



1199 AAAT T CAAGCAC CT GGAAAGGGCT TTT GAT AT GTT GT C AGAGTGT GGATT C CACAT GGT G 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1232 AAGTTCACCTACTTGGAGCAGGCGTTCGATCGACTGTCTGAGGCCGGCTTCCACATGGTG 



1231 



1258 



1291 



1259 GCCTGTAACTCATCGGTGACAGCAT CTTT CAT CAACCAATATACAGAT GACAAGATCTGG 1318 

I I I I I I I I I I I I I I I I III II I I I I I I I I I I I I I I I I I I II II 
1292 GCGTGCAACTCCTCTGGCACTGCCGCCTTTGTCAACCAGTACCGAGAC GACAAGATCTGG 1351 

1319 T CAAGCT AC ACT GAAT AT GTCT T CT AC C GT GAGCCT TCCAGAT GGTCACCCT CACAC 1375 



Db 1352 AGC AGTT ACACT GAAT AC AT CTT CT T C C GAC C AC CT C AGAAAAT AGT GT C ACC CAAGCAA 1411 

Qy 1376 TGCGATTGCTGCTGCAAGAATGGCAAAG GT GACAAAGAAGGGGAGAGC GGC AC GTCT 1432 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1412 GAACAT GAAGACAGGAAAC GC GACAAAGT C AC AGACAAAGGAAGT GAGAGT GG GACTT C C 1471 

Qy 1433 T GCAAT GAC CT CT CC ACAT CT AGCT GCGAC AGC C AGT CT GAGGC C AGCT CT CC C CAGGAG 1492 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 1472 T GCAAT GAGCT CT CC ACAT CC AGCT GT GAC AGC C ACT CAGAGGCC AGC ACT C C AC AGGAC 1531 

Qy 1493 A C GGT C AT CT GT GGT CCC GT GACAC GC C AGAC CAACAT C C AGACT CT G GAC CGT 1546 

I I II II I I I I I I I I I I I I I I I I I I I 

Db 1532 AAC C C AG C CAACACT C AGCAGGCT GCAGCT CAC C AGC CT AAC AC CTTAAC CTT GGAT AGA 1591 

Qy 1547 C CC AT CAAGAAGGGC C CT GT C C AGCT GATCCAAC AGT CAGAGAT GCGGCGGAAAAGC GAC 1606 

III I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1592 CCCT CCAGGAAAGCACCT GTT CAGTGGATGCCACCACCAGACAAGCGCAGAAACAGTGAA 1651 

Qy 1607 TTACTC CGGATTCTGACTT CAGGCTCCAGGGAAT CGAACAT GAGCAGCAAAAAAAAAGCT 1666 

I I I I III II I I I I I I I I I I I I I I I 

Db 1652 CTCTTT CAGTCACTCATCAGCAAGT CC CGAGAAACAAATCT CT CCAAAAAGAAG 1705 

Qy 1667 GTT AAAGAAAAGCTCT CAATT GAGGAG GAGCT GGAGAAAT GT AT C CAGGAT TT C CTAAAA 1726 

II I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I 

Db 1706 GT CT GT GAGAAGCTAAGT GT AGAAGAAGAAAT GAAAAAGT GT ATT C AGGATTT TAAAAAA 1765 

Qy 1727 AAAAAAATT C CAGAT C GGT TT CCT GAGAGAAAACAT CCT T GGCAATCT GAACT T TTAAGG 1786 

I I I I I I I I I I I I I I I II I II I I I I I I I II I I I I I I I I I II I 
Db 1766 AT CCAT ATT C CAGATT GT T TT CCAGAGCGCAAAC GC CAGT GGCAATCT GAACT CCT C CAA 1825 

Qy 1787 AAGTATCATCTATAA 1801 

I I I I I I I I I 

Db 1826 AAATATGGGTTGTAA 184 0 
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AK045439 2584 bp mRNA linear HTC 05-DEC-2002 

Mus musculus adult male corpora quadrigemina cDNA, RIKEN 
full-length enriched library, clone : B230119K12 product : hypothetical 
protein, full insert sequence. 
AK045439 

AK045439. 1 GI: 26337364 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; Sciurognathi; 
1 

Carninci,P. and Hayashizaki, Y. 
High-efficiency full-length cDNA cloning 
Meth. Enzymol. 303, 19-44 (1999) 
99279253 
10349636 
2 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 



Craniata ; Vertebrata ; Euteleos tomi ; 

Mur idae ; Mur inae ; Mus . 
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PUBMED 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
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AUTHORS 
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Itoh,M., Konno,H., Okazaki,Y., Muramatsu, M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new genes 

Genome Res. 10 (10), 1617-1630 (2000) 

20499374 

11042159 

3 

Shibata, K. , Itoh,M. 
Konno,H., Akiyama,J 
Sumi,N., Ishii,Y., Nakamura,S 
Yamamoto, R 
Fuj iwake, S 



Aizawa,K., Nagaoka,S., Sasaki, N., 
Nishi,K., Kitsunai,T., Tashiro,H 
Hazama,M., Nishine,T., 
Matsumoto,H. , Sakaguchi, S . , Ikegami,T., Kashiwagi,K 
Inoue,K., Togawa,Y., Izawa,M., Ohara,E., Watahiki,M 



Carninci, P. 
, Itoh,M., 
Harada, A. , 



Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai,J., 

Okazaki,Y., Muramatsu, M. , Inoue,Y., Kira,A. and Hayashizaki, Y. 

RIKEN integrated sequence analysis (RISA) system — 384-format 

sequencing pipeline with 384 multicapillary sequencer 

Genome Res. 10 (11), 1757-1771 (2000) 

20530913 

11076861 

4 

Kawai,J., Shinagawa, A. , Shibata,K., Yoshino,M., Itoh,M., Ishii,Y., 
Arakawa,T., Hara,A. , Fukunishi, Y. , Konno,H., Adachi,J., Fukuda,S., 
Aizawa,K., Izawa,M., Nishi,K., Kiyosawa,H., Kondo,S., Yamanaka, I . , 
Saito,T., Okazaki,Y., Gojobori,T., Bono,H., Kasukawa,T., Saito,R., 
Kadota,K., Matsuda,H., Ashburner,M. , Batalov, S., Casavant,T., 
Fleischmann, W. , Gaasterland, T . , Gissi,C, King,B., Kochiwa,H., 
Kuehl,P., Lewis, S., Matsuo,Y., Nikaido, I . , Pesole,G., 
Quackenbush, J. , Schriml, L .M. , Staubli,F., Suzuki, R. , Tomita,M., 
Wagner, L., Washio,T., Sakai,K., Okido,T., Furuno,M., Aono,H., 
Baldarelli, R. , Barsh,G., Blake, J., Boffelli,D., Bojunga,N., 
Carninci, P., de Bonaldo,M. F. , Brownstein,M. J. , Bult,C, 
Fletcher, C, Fujita,M., Gariboldi,M. , Gustincich, S . , Hill,D., 
Hofmann,M. , Hume, D. A., Kamiya,M., Lee,N.H., Lyons, P., 
Marchionni, L. , Mashima,J., Mazzarelli, J. , Mombaerts , P . , Nordone,P., 
Ring,B., Ringwald,M., Rodriguez, I . , Sakamoto, N., Sasaki, H., 
Sato,K., Schonbach, C. , Seya,T., Shibata, Y., Storch,K.F., Suzuki, H., 
Toyo-oka,K., Wang,K.H., Weitz,C, Whittaker, C . , Wilming,L., 
Wynshaw-Boris, A. , Yoshida,K., Hasegawa,Y., Kawaji,H., Kohtsuki,S. 
and Hayashizaki, Y. 

Functional annotation of a full-length mouse cDNA collection 

Nature 409 (6821), 685-690 (2001) 

21085660 

11217851 

5 

The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 
6 (bases 1 to 2584) 

Adachi,J., Aizawa,K., Akimura,T., Arakawa,T., Bono,H., Carninci,P., 
Fukuda,S., Furuno,M., Hanagaki,T., Hara,A. , Hashizume, W . , 
Hayashida, K. , Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., 
Hori,F., Imotani,K., Ishii,Y., Itoh,M., Kagawa,I., Kasukawa,T., 
Katoh,H., Kawai,J., Kojima,Y., Kondo,S., Konno,H., Kouda,M., 
Koya,S., Kurihara,C, Matsuyama, T. , Miyazaki,A., Murata,M., 
Nakamura,M., Nishi,K., Nomura, K., Numazaki,R., Ohno,M. , Ohsato,N., 



TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



CDS 



BASE COUNT 
ORIGIN 



Okazaki,Y., Saito,R., Saitoh, H., Sakai,C, Sakai,K., Sakazume, N . , 
Sano,H., Sasaki, D., Shibata,K., Shinagawa, A. , Shiraki,T., 
Sogabe,Y., Tagami,M., Tagawa,A. , Takahashi, F. , Takaku-Akahira, S . , 
Takeda,Y., Tanaka,T., Tomaru f A. , Toya,T., Yasunishi, A. , 
Muramatsu,M. and Hayashizaki, Y. 
Direct Submission 

Submitted ( 16- JUL-2001 ) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN) , Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome-res@gsc . riken . go . jp, 
URL :http: //genome. gsc. riken. go. jp/, Tel : 81-45-503-9222, 
Fax:81-45-503-9216) 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site for further details. 
URL:http: //genome. gsc. riken. go. jp/ 
URL: http://fantom.gsc. riken. go. jp/ . 

Location/Qualifiers 

1. .2584 

/organism="Mus musculus" 

/mol_type= ,, mRNA" 

/strain="C57BL/6J" 

/db_xre f = " FANTOMJDB :B230119K12" 

/db_xref="taxon: 10090" 

/clone= M B230119K12 M 

/sex="male n 

/ tissue_type="corpora quadrigemina" 

/clone_lib="RIKEN full-length enriched mouse cDNA library" 
/ de v_s t age= " adul t " 
262. .1692 

/note="unnamed protein product; hypothetical protein 
(evidence: rsCDS, ProCrest, decoder, Longest-ORF) 
putative" 
/codon__start=l 
/protein_id="BAC32368.1" 
/db_xref="GI: 26337365" 

/ translation="MALKDTGSGGSTILPISEMVSASSSPGAPLAAAPGPCAPSPFPE 
VVELNVGGQVTVTKHSTLLSVPDSTLASMFSPSSPRGGARRRGDLPRDSRARFFIDRD 
GFLFRYVLDYLRDKQLALPEHFPEKERLLREAEFFQLTDLV1CLLSPKVTKQNSLNDEC 
CQSDLEDNVSQGSSDALLLRGAAAGAPSGSGAHGVSGWGGGSAPYKRSGFLTLGYRG 
SYTTVRDNQADAKFRRVARIMVCGRIALAKEVFGDTLNESRDPDRQPEKYTSRFYLKF 
TYLEQAFDRLSEAGFHMVACNSSGTAAFVNQYRDDKIWSSYTEYIFFRPPQKIVSPKQ 
EHEDRKRDKVTDKGSESGTSCNELYTSSCDSHSEASTPQDNPANTQQAAAHQPNTLTL 
DRPSRKAPVQWMPPPDKRRNSELFQSLISKSRETNLSKKKVCEKLSVEEEMKKCIQDF 
KKIHIPDCFPERKRQWQSELLQKYGL" 
681 a 636 c 646 g 621 t 



Query Match 9.3%; 
Best Local Similarity 58.3%; 
Matches 778; Conservative 



Score 324.2; DB 11; 
Pred. No. 4.1e-26; 
0; Mismatches 458; 



Length 2584; 
Indels 99; Gaps 



8; 



Qy 560 GGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTT 619 



Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 



1 1 1 1 1 I M 1 1 1 1 1 ll l ll 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 

364 GGGCCCTGCGCCCCGTCGCCCTTCCCCGAGGTAGTGGAGCTGAATGTTGGCGGCCAGGTT 423 

620 TATTTTACTCGCCATTCCACATTGATAAGCATCCCTCATTCCCTCCTGTGGAAAATGTTT 679 

I I I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I 

424 TAT GT GACCAAGCATT C GAC GT T ACT C AGC GT CCC GGAC AGC ACT CT GGC C AGC AT GTT C 483 

680 TCCCCAAAGAGAGACACGGCTAAT GAT C T AGC CAAG GACT C CAAG 724 

I I I I II III II I I I I I I I I I 

484 TCACCCTCTAGTCCCCGGGGCGGCGCTAGGCGCCGGGGCGACTTGCCCAGGGACAGCCGC 543 

725 GGAAGGTTT TT C AT T GACAGAGAT GGATT CT T GTT CCGTT AT AT T CT GGAC TAT CT C AGG 784 

I I I I I I I I I I I I I I I I I I I I I II III I I I I I I I I I I I 
544 GCGCGCTTCTTCATCGACCGCGACGGCTTCCTCTTTAGGTACGTGCTGGATTACCTGCGC 603 



785 GACAGGCAGGT GGT C C T GCCT GAT C ACTTT C CAGAAAAAGGAAGACT GAAAAGGGAAGCT 

I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
604 GACAAGCAGCTGGCACTGCCCGAGCACTTTCCCGAGAAGGAGAGGCTCCTGCGCGAAGCA 

845 GAAT ACTT C C AGCT C C C AGACTTGGT CAAACT CCT GACCC C CGAT GAAAT CAAGCAAAGC 

II I III I I I I I I I III II I I II I II III I III I I I I I I II I I I 
664 GAGTTCTTTCAGCTCACCGACCTGGTCAAGCTGCTGTCGCCCAAGGTCACCAAGCAGAAC 



844 



663 



904 



723 



936 



783 



977 



843 



905 CC AGAT GAATT CT GCC ACAGT GACTTT GAAGA 

I I I I I I I I I I I I I II I I I I I I I I 

724 T C GCT CAAC GAT GAGT GCT GCC AGAGCGAC CT GGAGGACAAC GTT T C CCAGGGCAGC AGC 

937 T GCCT C C CAAGGAAGCGACACAAGAAT CTGC CC CC CT T C CT 

III I I I I I I I I I I I I I I 

784 GACGCACTGCTGCTGCGTGGGGCGGCGGCTGGCGCGCCCTCGGGCTCTGGGGCACATGGT 

978 CCCTG CTCCCTGCCGACCGCAAGTGGGGTTTCATTACTGTG 1018 

II I I I I I I I I I I II I I I I I 

844 GTCAGTGGTGTAGTCGGTGGTGGCAGCGCTCCGTACAAGCGCTCTGGGTTCCTCACCCTG 903 

1019 GGTTACAGAGGATCCTGCACCTTGGGCAGAGAGGGACAGGCAGATGCCAAGTTTCGGAGA 107 8 
I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I III 

GGCTACCGTGGCTCTTACACCACGGTGCGAGATAACCAGGCAGATGCCAAGTTCAGGCGT 



904 



963 



1079 GTTCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAAACT 1138 
II Mill II II I I I I I III I I I I I I I I I I I I I I I II I I I I I I I 

964 GTGGCGCGCATCATGGTGTGTGGGCGCATAGCCTTGGCCAAGGAGGTCTTTGGGGACACT 1023 

1139 T T GAAT GAAAGC AGAGAC CCT GAT C GAGC C C CAGAAAGAT ACAC CT C C AGAT TTT AT CT C 1198 

I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

1024 CTT AAT GAGAGT CGCGACC CT GACC GT CAGC CTGAGAAGT ACACAT C CC GCT T CT AC CT C 1083 

1199 AAAT T CAAGCAC CT GGAAAGGGCT TTT GAT AT GT T GT C AGAGT GT GGAT T C C ACAT GGT G 1258 

I I I I I I I I I I I I III II II I I I II I I I I I II I I I I II I I I I 

1084 AAGT T C AC CT ACTT GGAGC AGGCGT T C GAT C GACT GT CT GAGGC C GGCTT C CACAT GGT G 1143 

1259 GC CT GT AACT CAT C GGT GACAGCAT CT TT CAT CAACCAAT AT AC AGAT GACAAGAT CT GG 1318 

I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I II I I I I I I I II 

1144 GCGTGCAACTCCTCTGGCACTGCCGCCTTTGTCAACCAGTACCGAGACGACAAGATCTGG 1203 



1319 TCAAGCT ACACT GAAT AT GT CT TCT AC C GT GAGC CT T C C AGAT GGT CACCCT C AC AC 

I I I I I I I I I I I I I I II I I I I I I III II I I I I I I I II 



1375 



Db 1204 AGC AGTT ACACTGAAT AC AT CT TCT T C C GACC ACCT CAGAAAAT AGT GT C AC CCAAGCAA 1263 



Qy 1376 T G C GATT GCT GCT GC AAGAAT GGCAAAG GTGACAAAGAAGGGGAGAGCGGCACGTCT 1432 

II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1264 GAAC AT GAAGAC AGGAAAC GC GACAAAGT C AC AGACAAAGGAAGTGAGAGT GGGACTT C C 1323 

Qy 1433 T GCAAT GACCT CT C CAC AT CT AGCT G C GAC AGC CAGT CT GAGGCC AGCT CT C CC C AGGAG 1492 

I I I I I I I I I I I I I I I I I I II I I I I I I I II I I II I I I I I I I I I I I I I I I I I I 
Db 1324 T GCAAT GAGCT CT ACAC AT CCAGCT GT GAC AGC CACT C AGAGGC CAGC ACT CCAC AGGAC 1383 

Qy 1493 A C GGT CAT CT GT GGT C CC GT GACACG C CAGACCAACAT CC AGACT CTGGAC C GT 1546 

I I II II I I I I I I I I I I I I I I I I I I I 

Db 1384 AACCCAGCCAACACTCAGCAGGCTGCAGCTCACCAGCCTAACACCTTAACCTTGGATAGA 1443 

Qy 1547 CC CAT CAAGAAGGG C C CT GT C C AGCT GAT C CAACAGT C AGAGAT GC GGC GGAAAAGCGAC 1606 

III II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 1444 CCCT CCAGGAAAGCACCT GTTCAGT GGATGCCACCACCAGACAAGCGCAGAAACAGTGAA 1503 

Qy 1607 TT ACT CC GGATT CT GACT T CAGGCT C CAGGGAAT C GAACAT GAGCAGCAAAAAAAAAGCT 1666 

I I I I III III I III I II I I I I I I I 

Db 1504 CT CTT TC AGT CACT CAT C AGCAAGT CC C GAGAAACAAAT CT CT CCAAAAAGAAG 1557 

Qy 1667 GTTAAAGAAAAGCT CT CAATT GAGGAGGAGCT GGAGAAAT GT AT CCAGGAT T T C CTAAAA 1726 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1558 GT CT GT GAGAAGCT AAGT GT AGAAGAAGAAAT GAAAAAGT GTATT C AGGAT T TTAAAAAA 1617 

Qy 1727 AAAAAAAT T C C AGAT C GGT TT C CT GAGAGAAAAC AT C CTT GGCAAT CT GAACTT T T AAGG 1786 

I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I II I I I I I 

Db 1618 AT C CAT AT TC C AGAT T GT T TT C C AGAGC GCAAACGC C AGT G GCAAT CT GAACT C CT C CAA 1677 

Qy 1787 AAGT AT CATCT ATAA 1801 

II III I I I I 

Db 1678 AAATATGGGTTGTAA 1692 



RESULT 11 

AK042569 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 
MEDLINE 
PUBMED 

REFERENCE 
AUTHORS 



AK042569 2555 bp mRNA linear HTC 05-DEC-2002 

Mus musculus 7 days neonate cerebellum cDNA, RIKEN full-length 
enriched library, clone :A730006K23 product : hypothetical protein, 
full insert sequence. 
AK042569 

AK042569.1 GI: 26335190 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Rodentia; 
1 

Carninci,P. and Hayashizaki, Y. 
High-efficiency full-length cDNA cloning 
Meth. Enzymol. 303, 19-44 (1999) 
99279253 
10349636 
2 

Carninci,P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata,K., 
Itoh,M., Konno,H., Okazaki,Y., Muramatsu,M. and Hayashizaki, Y. 



Craniata; Vertebrata; Euteleostomi ; 
Sciurognathi; Muridae; Murinae; Mus. 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
REFERENCE 
AUTHORS 



Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new genes 

Genome Res. 10 (10), 1617-1630 (2000) 

20499374 

11042159 

3 

Shibata,K., Itoh,M., Aizawa,K., Nagaoka,S., Sasaki, N., Carninci,P., 
Konno,H., Akiyama,J., Nishi,K., Kitsunai,T., Tashiro,H., Itoh,M., 
Sumi,N., Ishii,Y., Nakamura,S., Hazama^., Nishine,T., Harada,A. , 
Yamamoto,R., Matsumoto, H . , Sakaguchi, S . , Ikegami,T., Kashiwagi, K. , 
Fujiwake,S., Inoue,K., Togawa,Y., Izawa,M., Ohara,E., Watahiki,M., 
Yoneda,Y., Ishikawa,T., Ozawa,K., Tanaka,T., Matsuura,S., Kawai, J., 
Okazaki,Y., Muramatsu, M. , Inoue,Y., Kira,A. and Hayashizaki, Y. 
RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
20530913 
11076861 
4 

Kawai, J. , 
Arakawa, T 
Aizawa, K. 



Ishii,Y., 
Fukuda, S . , 



Shinagawa, A. , Shibata,K., Yoshino,M. , Itoh,M. 
, Hara,A. , Fukunishi, Y. , Konno,H., Adachi,J., 
Izawa,M., Nishi,K., Kiyosawa,H., Kondo,S., Yamanaka,I., 
Saito,T., Okazaki,Y., Gojobori,T., Bono,H., Kasukawa,T., Saito,R., 
Kadota,K., Matsuda,H., Ashburner ,M. , Batalov, S., Casavant,T., 
Fleischmann, W. , Gaasterland, T . , Gissi,C, King,B., Kochiwa,H., 
Kuehl,P., Lewis, S., Matsuo,Y., Nikaido,I., Pesole,G., 
Quackenbush, J. , Schriml, L.M. , Staubli,F., Suzuki, R., Tomita,M. , 
Wagner, L., Washio,T., Sakai,K., Okido,T., Furuno,M., Aono,H., 
Baldarelli, R. , Barsh,G., Blake, J., Boffelli,D., Bojunga,N., 
Carninci,P., de Bonaldo,M. F. , Brownstein, M. J. , Bult,C, 
Fletcher, C, Fujita,M., Gariboldi,M. , Gustincich, S . , Hill,D., 
Hofmann,M., Hume, D. A., Kamiya,M., Lee,N.H., Lyons, P., 
Marchionni, L. , Mashima,J., Mazzarelli, J . , Mombaerts, P. , Nordone,P., 
Ring,B., Ringwald,M., Rodriguez, I . , Sakamoto, N. , Sasaki, H., 
Sato,K., Schonbach, C. , Seya,T., Shibata,Y., Storch,K.F., Suzuki, H., 
Toyo-oka,K., Wang,K.H., Weitz,C, Whittaker , C . , Wilming,L., 
Wynshaw-Boris,A. , Yoshida,K., Hasegawa,Y., Kawaji,H., Kohtsuki,S. 
and Hayashizaki, Y. 

Functional annotation of a full-length mouse cDNA collection 

Nature 409 (6821), 685-690 (2001) 

21085660 

11217851 

5 

The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 
6 (bases 1 to 2555) 

Adachi,J., Aizawa, K. , Akimura,T., Arakawa, T., Bono,H., Carninci,P., 
Fukuda, S., Furuno,M., Hanagaki,T., Hara,A., Hashizume, W. , 
Hayashida, K. , Hayatsu,N., Hiramoto,K., Hiraoka,T., Hirozane,T., 
Hori,F., Imotani,K., Ishii,Y., Itoh,M., Kagawa,I., Kasukawa,T., 
Katoh,H., Kawai, J., Kojima,Y., Kondo,S., Konno,H., Kouda,M., 
Koya,S., Kurihara,C, Matsuyama, T . , Miyazaki,A., Murata,M., 
Nakamura,M., Nishi,K., Nomura, K., Numazaki,R., Ohno,M., Ohsato,N. f 
Okazaki,Y., Saito,R., Saitoh, H., Sakai,C, Sakai,K., Sakazume,N., 



Sano,H., Sasaki, D., Shibata,K., Shinagawa, A. , Shiraki,T., 
Sogabe,Y., Tagami,M., Tagawa,A. , Takahashi, F. , Takaku-Akahira, S . , 
Takeda,Y., Tanaka,T., Tomaru,A., Toya,T., Yasunishi, A. , 
Muramatsu,M. and Hayashizaki, Y. 
TITLE Direct Submission 

JOURNAL Submitted ( 16- JUL-2001 ) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN) , Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome-res@gsc . riken . go . jp, 
URL :http: //genome. gsc. riken. go. jp/, Tel : 81-45-503-9222, 
Fax:81-45-503-9216) 
COMMENT cDNA library was prepared and sequenced in Mouse Genome 

Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. 

Please visit our web site for further details. 
URL: http://genome.gsc. riken. go. jp/ 
URL : http : / / f antom. gsc . riken . go . jp/ . 
FEATURES Location/Qualifiers 
source 1. .2555 

/organism="Mus mus cuius" 

/mol_type="mRNA" 

/strain="C57BL/6J" 

/ db_xr e f = " FANT 0M_DB : A7 3 0 0 0 6K2 3 " 

/db_xref="taxon: 10090" 

/clone="A730006K23" 

/tissue_type=" cerebellum" 

/clone_JLib=" RIKEN full-length enriched mouse cDNA library" 
/dev_stage="7 days neonate" 
CDS 262. .1695 

/note="unnamed protein product; hypothetical protein 
(evidence : rsCDS, ProCrest, decoder , Longest-ORF) 
putative " 
/ codon_start=l 
/protein_id="BAC31296. 1" 
/db_xref="GI : 26335191" 

/ translation="MALKDTGSGGSTILPISEMVSASSSPGAPLAAAPGPCAPSPFPE 
IVELNVGGQVYVTKHSTLLSVPDSTLASMFSPSSPRGRAPRRRGDLPRDSRARFFIDR 
DGFLFRYVXDYLRDKQLALPEHFPEKERLLREAEFFQLTDLVKLLSPKVTKQNSLNDE 
CCQSDLEDNVSQGSSDALLLRGAAAGAPSGSGAHGVSGWGGGSAPDKRSGFLTLGYR 
GSYTTVRDNQADAKFRRVARIMVCGRIALAKEVFGDTLNESRDPDRQPEKYTSRFYLK 
FT YLEQAFDRLSEAGFHMVACNS SGTAAFVNQYRDDKIWS S YTEYI FFRPPQKI VS PK 
QEHEDRKRDKVTDKGSESGTSCNELSTSSCDSHSEASTPQDNPANTQQAAAHQPNTLT 
LDRPSRKAPVQWMPPPDKRRNSELFQSLISKSRETNLSKKKVCEKLSVEEEMKKCIQD 
FKKIHI PDCFPERKRQWQSELLQKYGL" 

BASE COUNT 666 a 638 c 643 g 608 t 

ORIGIN 



Query Match 9.3%; Score 322.8; DB 11; Length 2555; 

Best Local Similarity 58.2%; Pred. No. 5.8e-26; 

Matches 779; Conservative 0; Mismatches 457; Indels 102; Gaps 8; 

Qy 560 GGGTCCGCAGTTCCCAACTCCTTCCCTGAGGTGGTAGAGCTGAATGTCGGGGGTCAAGTT 619 

I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 364 GGGCCCTGCGCCCCGTCGCCCTTCCCCGAGATAGTGGAGCTGAATGTTGGCGGCCAGGTT 423 



Qy 620 T ATTTT ACT C G C CAT T C CAC ATTGATAAGCAT CC CT C AT TCCCTCCT GT GGAAAAT GT T T 679 

I I I I I I I I II I I I I I I I I I I I I I I I III I I I I I I 

Db 424 T ATGT GACCAAGCAT T C GACGTT ACT C AGC GT CC C GGACAGC ACT CT GGC C AGC AT GTT C 483 

Qy 680 T CCCCAAAGAGAGACAC GGCTAAT GATC TAGCCAAGGACTCC 721 

I I II II III II I I I I I I I I I 

Db 4 84 TCACCCTCTAGTCCCCGGGGGCGGGCGCCTAGGCGCCGGGGCGACTTGCCCAGGGACAGC 543 

Qy 722 AAGGGAAGGT TTTT C ATT GAC AGAGAT GGAT T CTT GTTCC GT T AT ATTCT GGACT AT CT C 781 

I I I I I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I 
Db 544 CGCGCGCGCTTCTTCATCGACCGCGACGGCTTCCTCTTTAGGTACGTGCTGGATTACCTG 603 

Qy 782 AGGGACAGGCAGGT GGTCCT GCCT GAT CACTTTCCAGAAAAAGGAAGACT GAAAAGGGAA 841 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 604 C GCGACAAGC AGCT G GCACT GC CC GAGCACT T TC C C GAGAAGGAGAGGCT CCT GC GC GAA 663 

Qy 842 GCTGAATACTTCCAGCTCCCAGACTTGGTCAAACTCCTGACCCCCGATGAAATCAAGCAA 901 

II II I III I I I I I I I III I I I I I I I II III I III I I I I I I I I I 

Db 664 GCAGAGTTCTTTCAGCTCACCGACCTGGTCAAGCTGCTGTCGCCCAAGGTCACCAAGCAG 723 

Qy 902 AGCCC AGAT GAAT T CT GC CAC AGT GACT T T GAAGA 936 

III I I I I I I I I I I I I I I I I I I I I I I 

Db 724 AACT CGCT CAACGAT GAGT GCT GC CAGAGC GACCT GGAGGACAAC GT TT C CCAGGGCAGC 783 

Qy 937 T GC CT CC CAAGGAAGCGACACAAGAAT CT GCC C C C CT T 974 

III I I I I I I I I I I III 

Db 784 AGCGACGCACTGCTGCTGCGTGGGGCGGCGGCTGGCGCGCCCTCGGGCTCTGGGGCACAT 843 

Qy 975 CCTCCCTG CTCCCTGCCGACCGCAAGTGGGGTTTCATTACT 1015 

III I I I I I I I I I I I I I I I I 

Db 844 GGTGTCAGTGGTGTAGTCGGTGGTGGCAGCGCTCCGGACAAGCGCTCTGGGTTCCTCACC 903 

Qy 1016 GT GGGTTACAGAGGAT CCTGCACCTT GGGCAGAGAGGGACAGGCAGATGCCAAGTTT CGG 107 5 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II 

Db 904 CT GGGCT ACCGT GGCT CTT AC AC CAC GGT GC GAGATAAC C AGGCAGAT GCCAAGTT C AGG 963 

Qy 1076 AGAGTTCCCCGGATTTTGGTTTGTGGAAGGATTTCCTTGGCAAAAGAAGTCTTTGGAGAA 1135 

III I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I 
Db 964 CGTGTGGCGCGCATCATGGTGTGTGGGCGCATAGCCTTGGCCAAGGAGGTCTTTGGGGAC 1023 

Qy 1136 ACTTTGAAT GAAAGCAGAGACCCTGAT CGAGCCCCAGAAAGATACACCTCCAGATTTT AT 1195 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1024 ACTCT TAAT GAGAGT C GCGAC C CT GAC CGT CAGCCT GAGAAGT ACAC AT C C C GCTT CT AC 1083 

Qy 1196 C T CAAAT T CAAGCAC CT GGAAAGGGCT TT T GAT AT GT TGT CAGAGT GTGGATT C CAC AT G 1255 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1084 CTCAAGTTCACCTACTTGGAGCAGGCGTTCGATCGACTGTCTGAGGCCGGCTTCCACATG 1143 

Qy 1256 GT GGC CT GTAACTC AT C GGT GACAGC AT CT T T CAT CAAC CAAT AT AC AGAT GACAAGAT C 1315 

I I I I I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I 
Db 1144 GT GGC GT GCAACTCCT CT GGC ACT GC C GC CT T T GT CAAC CAGTAC C GAGAC GACAAGAT C 1203 

Qy 1316 T GGT CAAGCT AC ACT GAAT AT GTCT T CT AC C GT GAGCCT T C C AGAT GGT CAC C CTC A 1372 

III I I I I I I I I I I I I I I I I I I I I I I III II I I I I I I I 

Db 1204 T GGAGCAGT T AC ACT GAAT AC AT CT T CTT C C GAC CAC CT CAGAAAAT AGT GT CAC C CAAG 1263 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1373 CACTGCGATTGCTGCTGCAAGAATGGCAAAG GT GACAAAGAAGGGGAGAG CGGC AC G 1429 

II II I I I I I I I I I I I I I I II I I I I I I I I I I I I 

1264 CAAGAACAT GAAGAC AGGAAAC GC GACAAAGT C ACAGACAAAGGAAGT GAGAGT GGGACT 1323 

1430 T CT T G CAAT GAC CT CT C C ACAT CT AGCT GC GAC AGC CAGT CT GAGGC C AGC T CTC C CC AG 1489 

II I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I I I III 
1324 TCCTGCAAT GAGCT CT CCACATCCAGCT GT GACAGCCACT CAGAGGCCAGCACTCCACAG 1383 

1490 GAGA C GGT C ATCT GT GGT C CC GT GACAC GC CAGAC CAAC AT C C AGAC T CT GGAC 1543 

III I II II I I I I I I I I I I I I I I I I I I 

1384 GACAAC C CAG CC AAC ACT C AGCAGGCT GCAGCT CACCAGC CTAACAC CTT AAC CT T GGAT 1443 

1544 CGT CC C AT CAAGAAGGGC C CT GT C C AGCT GATC CAACAGT CAGAGAT GCGGC GGAAAAGC 1603 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
14 44 AGACC C T C CAG GAAAGCACCT GT T CAGT GGAT GC CAC CAC CAGACAAGC GCAGAAAC AGT 1503 



1604 GACTTACT CCGGATT CTGACTT CAGGCT CCAGGGAATCGAACATGAGCAGCAAAAAAAAA 

I I I I I I II I I I I I II I I I I I I I I I I I 

1504 GAACT CTTT C AGT C ACT CAT C AGCAAGT CC C GAGAAACAAAT CT CTCCAAAAAG 



1663 



1557 



1723 



1664 GCT GTTAAAGAAAAGCTCT CAATT GAGGAGGAGCTGGAGAAAT GT ATCCAGGATTTCCTA 
II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

1558 AAGGT CT GT GAGAAGCTAAGT GT AGAAGAAGAAAT GAAAAAGT GT ATT C AGGAT T TT AAA 1617 

1724 AAAAAAAAAATT C C AGAT C GGT TT CCT GAGAGAAAACAT C CTT GGC AAT CT GAACTTT T A 1783 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1618 AAAAT C CAT ATT C C AGATT GT T TT CC AGAGCGCAAAC GCCAGT GGCAAT CT GAACT CCT C 1677 

1784 AG GAAGT AT C AT CT AT AA 18 01 

I I I I I I I I I 

1678 CAAAAATATGGGTTGTAA 1695 



RESULT 12 

BQ713664 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 

COMMENT 



BQ713664 973 bp mRNA linear EST 16-JUL-2002 

AGENCOURT_8 4 80138 NIHJVIGC_129 Mus musculus cDNA clone IMAGE : 6310836 
5 f , mRNA sequence. 
BQ713664 

BQ713664.1 GI: 21852563 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 973) 
NIH-MGC http://mgc.nci.nih.gov/. 

National Institutes of Health, Mammalian Gene Collection (MGC) 
Unpublished 

Contact: Robert Strausberg, Ph.D. 

Email : cgapbs-r@mail . nih . gov 

Tissue Procurement: Susan L. Sullivan, PhD. 
cDNA Library Preparation: ResGen, Invitrogen Corp 
cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
DNA Sequencing by: Agencourt Bioscience Corporation 
Clone distribution: MGC clone distribution information can be 

found through the I.M.A.G.E. Consortium/LLNL at: 



FEATURES 

source 



BASE COUNT 
ORIGIN 



http : //image . llnl . gov 
Plate: LLAM13733 row: in column: 13 
High quality sequence start: 23 
High quality sequence stop: 592. 

Location/Qualifiers 

1. .973 

/organism="Mus musculus" 

/mol_type="mRNA" 

/db_xref="taxon: 10090" 

/ clone= M IMAGE : 6310836" 

/lab_host="DH10B (phage-resistant ) " 

/clone_lib="NIH_MGC_129" 

/note="Organ: olfactory epithelium; Vector: 
pCMV-SPORT6.1.ccdb; Site_l: EcoRV; Site_2 : NotI; Cloned 
unidirectionally . Primer: Oligo dT. Average insert size 
2.2 kb. Constructed by ResGen, Invitrogen Corp. Note: this 
is a NIH_MGC Library." 
275 a 216 c 192 g 288 t 2 others 



Query Match 8.9%; 
Best Local Similarity 68.8%; 
Matches 681; Conservative 



Score 310; DB 13; 
Pred. No. 2.4e-24; 
0; Mismatches 242; 



Length 973; 
Indels 67; 



Gaps 16; 



Qy 1690 GGAGGAG CT GGAGAAAT GT AT C CAGGATT T CCT A — AAAAAAAAAATTCCAGATCGGTTT 1747 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II II 
Db 1 GGAAGAGCT GGAGAAAT GT AT CNCAGGATTTCTNT GAAGATAAAAATTCCAGATCGCTT C 60 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 
Db 

Qy 

Db 



1748 C CT GAGAGAAAAC AT C CTT GGCAAT CT GAACT TTT AAGGAAGT AT CAT CT ATAAGGGAGG 1807 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I III I 
C CT GAGC GAAAAC AT C CTT GGC AGT CT GAACT TT T AC GGAAGT AT CAT CT AT AGGGGGAG 



61 



120 



18 08 GCTGGGGGCGGGGAAAAAAAAAAAAAAGAGTCATTTTGAAATTAACCTCATAAAAGGAAT 1867 

I III I I I I I I I I I I I I I I I I I I I I I I I I I 

121 GGCTGTGG GT AGTC GCCACT TT GAAATAAAC CT C C C CAAAGGAAG 165 

1868 T C AT AT TTTAAAGGAAAAAAAT ACAACTAAT GAT GCAC ATTT CT T AGAAC ACAAT AGT C C 1927 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

AC AT AT GTTAAAGGAAAAAT A- ACAACT AACGGT C CACAT TT GT T AGAT C ACAAT - GT C C 223 



166 



1928 



224 



ATT GAT AT ACT ACT GCCT ACT T T AC CT AGTT CAC CT TAACAT GTAAAT C C AC AGGGT AG A 1987 
I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I 
AT TGAT GT ACT ACT GC CT ACT T T GC CT AGCT C AC CT TAAC GT GTAAAT CCAC AGGGT AGA 283 



1988 T T TCTT T CT AGAT GT GGAAGT ACAAGAAAAT CTT TTT T AGT — T ATTT GT T T GT T TACT T 2045 

I II I I I I I I I I I I I I I I I I III I II III I I I I I I I I I I I I 

T T T CTTT CT AGAT GT GGAAC CAGAAAC GAGCT CT T AGTT GT C CT T GT CT T T TAT T TACT T 



284 



343 



2046 C GT C C CAT GT GCT AACT AT CTT - AT AT AT AAT GAGAGCC AGCTAC GTAAAAGT AGCT GAG 2104 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

344 GGTC C CAT GT GCT GAGAAT CT T AAGAT ACAACAAGAACCAGCT AC GT GT GAGT AGCT CAC 403 

2105 AGGCCTTGGGAGTCATTTATCCCAAACTGGGTTTTTT 2141 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

404 AGGCTTTGGGAATCATTGATCCCAAACCAGGTTTTTTTGTTTTGTTTTGTTTTGTTTTGT 463 



Qy 



2142 - 



- - CT CT CAT C CTT CT ACCT C C CT C CT TT GA — AT GAGG GT AT GGT AGAAAAAGAT CT GG 2196 



Db 

Qy 

Db 
Qy 

Db 

Qy 
Db 

Qy 
Db 

Qy 
Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 



464 



2197 



524 



I I I I I I I I I I I I I I I I I I I I I I III II I I I I I I I II I 

TTACTCTCATTTTTCTGCCTCCTCCCCTTGACCAAGAATGGACAGTTGAAGGAGATATAA 



523 



CCCAATGGCATAAGTTTGGAATTTTTAATTTTGGTTTTTCCT TTTGTTTATGGGGTT 2253 

III I I I I II I I I III II I I I I I I I I I I I I I I I I I I I II I I I I 
CCCGGTGGCTTATGTTAAGAAATTATCCTTTTCCCTTTCCTTTTGTTTGTTTATGGGGTT 583 



2254 — GGGGGGAATGGCAGATTTATAT GACTTTT CACTCAAAT CT ATATGTGCCAGTTTATAT 2311 

I I I I I I I I I I I I MM I I I I I I I I I I I I I I I I I I I I I I I II I II I M III 

584 GAGGGGAGAATGGCAAAT TT GT AT GATT TTT C ACTAAAAT CT CT AT GT G C C AGGTT CT AT 643 
2312 T GACT CC GT AT GC AT GAGT ATTT GT G CAACACAAGCACAACTAAGTAT GT ATATACA 2368 

I I I II M II I I I I I I I II I I III I I I I I MM Ml III 

644 T GACT TT GT AT GCAT GAGCC GTT CT GACACAAGC AC AGT AT AT GT CT GTAT AT ATGC AC A 703 

2369 CAT GAC GC AC AC GAT GC C AGGGCCT AGACCT CC CAAGGGCT GT GCT C CTG CT C C C AGCAG 2428 
I I I I II M I I I I I II I I I I I I I I I I I I II I I II I 

704 AAGAATGCACACGACCTAAGGGC — TGGACAGCAGAGGGCTAACATCTTACTAT CAGCT G 761 

2429 CCCTCTCTTAGAATATTTCAGATGGATGAGCTTCTGACTCTTTCTTAAAATTCTTTTGGG 24 88 

I I I I I III I I I I I II I I I I I II I I I I I M I I M I II MM 

762 CCC-CTACAAGAGCATTTCAGA-CAACAAGCCTCTGGCTATTTATT-AAACCCTCCTGGG 818 

2489 AAGATTTCCCAGCCTTTCTT CACAACACTTTCTAACATCAAAT GACT CT CAT CAT C 2544 

I I I I I II II II II II II I I III II I M I I I II II I I 

819 CAAATTTCCCAGCCTCCCCTGGCAGGCAACCTTTTTAAAGCTGAATTAGGCCCCATCATC 878 

2545 AACAAATTGTATTCCTTATTGTGAAATTAATACCCTCAGGCTCCATTTTACTGCTTTGCT 2604 

I I I II I I II II I I II III I I I I I I I I I II M I M I I 

879 AAAAAATTCCCTTCCTAATTTGAAAAAAAAAACCCCCAGGCTCCCTTGGAAATAATAAGG 938 

2605 CT T T GT CT GC AT TAAGAGAGGAT GAGGAGA 2634 
I III II I I II I I M I I 
939 CCCTTTCCCCACCTAAAGAACCTGGGGAAA 968 



RESULT 13 

AA332022 

LOCUS 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AA332022 319 bp mRNA linear EST 21-APR-1997 

EST35911 Embryo, 8 week I Homo sapiens cDNA 5 1 end, mRNA sequence. 
AA332022 

AA332022 .1 GI : 1984264 
EST. 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; 
Mammalia; Eutheria; Primates; 
1 (bases 1 to 319) 

Adams,M.D., Kerlavage, A. R. , Fleischmann, R. D . , Fuldner , R. A. , Bult 
,C.J., Lee,N.H., Kirkness , E. F. , Weinstock, K. G. , Gocayne, J. D . , White 
,0., Sutton, G., Blake, J. A., Brandon, R. C . , Man-Wai,C, Clayton, R. A. , 
Cline,T.R. , Cotton, M.D. , Earle-Hughes , J. , Fine, L.D. , 
,L.M. , Fitzhugh,W.M. , Fritchman, J.L. , Geoghagen, N . S . 
Gnehm, C.L., Hanna,M.C, Hedblom, E. , Hinkle, P . S . Jr . 
Kelley, J. C . , Liu, L.-I . , Marmaros , S . M. , Merrick, J. M. , 
Moreno-Palanques, R. F. , McDonald, L. A. , Nguyen, D. T . , Pelligrino, S .M. , 
Phillips, C. A. , Ryder, S.E., Scott, J. L., Saudek,D.M., Shirley, R. , 



Craniata ; Vertebrata ; Euteleostomi ; 
Catarrhini; Hominidae; Homo. 



Fitzgerald 
Glodek,A., 
Kelley, J. M. , 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



FEATURES 

source 



Small, K. V., Spriggs, T . A. , Utterback, T . R Weidman, J. F. , Li,Y., 
Bednarik,D. P. , Cao,L., Cepeda,M.A., Coleman, T .A. , Collins, E. J. , 
Dimke,D., Feng,D.-F., Ferrie,A. , Fischer, C, Hastings , G. A. , He,W.W. 
, Hu,J.S., Greene, J. M. , Gruber,J., Hudson, P., Kim, A. K. , Kozak, D. L. , 
Kunsch,C, Hungjun,J., Li, H . , Meissner , P . S . , 01sen,H., Raymond, L., 
Wei,Y.F., Wing, J. , Xu,C, Yu,G.L., Ruben, S.M., Dillion, P. J. , Fannon 
,M.R., Rosen, C. A., Haseltine, W . A. , Fields, C, Fraser,C.M. and 
Venter, J. C. 

Initial assessment of human gene diversity and expression patterns 

based upon 83 million nucleotides of cDNA sequence 

Nature 377 (6547 Suppl) , 3-174 (1995) 

96026280 

7566098 

Contact: Kerlavage, AR 
Bioinformatics 

The Institute for Genomic Research 

9712 Medical Center Drive, Rockville, MD 20850 USA 

Tel: 3018699056 

Fax: 3018699423 

Email: arkerlav@tigr.org 

For clone availability, additional sequence and expression 
information related to this EST, please check the TIGR Human Gene 
Index (http: //www. tigr . org/tdb/hgi/hgi . html ) 
Seq primer: M13 Reverse. 

Location/Qualifiers 

1. .319 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref= ,, ATCC (inhost) : 133650" 
/db xref="taxon:9606" 

'embryo, 8 wks" 



BASE COUNT 
ORIGIN 



99 



/dev_stage= 
/clone_lib="Embryo, 8 week I" 
/note="Organ: Embryo, 8 weeks; 
Site_l: EcoRI; Site_2 : Xhol" 
a 61 c 47 g 112 t 



Vector: pBluescript SK-; 



Query Match 8.8%; 
Best Local Similarity 99.4%; 
Matches 317; Conservative 



Score 305.4; DB 9; 
Pred. No. 1.4e-23; 
0; Mismatches 1; 



Length 319; 
Indels 1; 



Gaps 



1; 



Qy 1842 TTTGAAATTAACCTCATAAAAGGAATT CATATTTTAAAGGAAAAAAATACAACTAAT GAT 1901 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I 
Db 1 TTT GAAATTAACCTCCTAAAAGGAATT CATATTTTAAAGGAAAAAAATACAACTAAT GAT 60 

Qy 1902 GCAC ATT T CT TAGAAC ACAAT AGT C CATT GAT AT ACT ACT GC CT ACT TT AC CT AGT T CAC 1961 

I I I I I I I I I II I I I II I I I II I I I I II I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I 

Db 61 GCAC AT T T CT TAGAAC ACAAT AGT C CAT T GAT AT ACT ACT GC CT ACT T T AC CT AGT T CAC 120 



Qy 1962 CT TAAC AT GTAAAT C C AC AGGGT AGATT T CT TT CT AGAT GT GGAAGT ACAAGAAAAT CT T 2021 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 121 CT TAACAT GTAAAT C CAC AGGGT AGATT T CT TT CT AGAT GT GGAAGT ACAAGAAAAT CTT 180 



Qy 2022 TTTTAGTTATTTGTTTGTTTACTTCGTCCCATGTGCTAACTATCTTATATATAATGAGAG 2081 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 181 TTT T AGTT AT TT GTT T GTTT ACTT C GT C C CAT GT GCT AACTATCTT AT AT ATAAT GAGAG 240 



Qy 2082 CCAGCTACGTAAAAGTAGCTGAGAGGCCTTGGGAGTCATTTATCCCAAACTGGG-TTTTT 214 0 

I I I II I I I II I I I I I | | | | | | | | | | | | I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 CCAGCTACGTAAAAGTAGCTGAGAGGCCTTGGGAGTCATTTATCCCAAACTGGGTTTTTT 300 

Qy 2141 TCTCTCATCCTTCTACCTC 2159 

I I I I I I I I I I I I I II I I I I 
Db 301 TCTCTCATCCTTCTACCTC 319 



RESULT 14 

BY706433 

LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
COMMENT 



BY706433 424 bp mRNA linear EST 16-DEC-2002 

BY706433 RIKEN full-length enriched, adult male testis Mus musculus 
cDNA clone 1700026A08 5', mRNA sequence. 
BY706433 

BY706433.1 GI: 27117598 
EST. 

Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 (bases 1 to 424) 

Okazaki,Y., Furuno,M., Kasukawa,T., Adachi,J., Bono,H., Kondo,S., 
Nikaido,I., Osato,N., Saito,R., Suzuki, H., Yamanaka,I., Kiyosawa,H. 
, Yagi,K., Tomaru,Y., Hasegawa,Y., Nogami,A. , Schonbach, C . , 
Gojobori,T. , Baldarelli, R. , Hill, D. P., Bult,C, Hume, D. A., 
Quackenbush, J. , Schriml, L.M. , Kanapin,A., Matsuda,H., Batalov, S., 
Beisel,K.W., Blake, J. A., Bradt,D., Brusic,V., Chothia, C . , Corbani 
,L.E., Cousins, S., Dalla,E., Dragani, T . A. , Fletcher, C . F. , Forrest 
,A., Frazer,K.S., Gaasterland, T . , Gariboldi,M. , Gissi,C, Godzik,A. 
, Gough,J., Grimmond,S., Gustincich, S . , Hirokawa,N., Jackson, I . J. , 
Jarvis,E.D., Kanai,A. , Kawaji,H., Kawasawa,Y., Kedzierski , R.M. , 
King,B.L., Konagaya,A. , Kurochkin, I . V. , Lee,Y., Lenhard,B., Lyons 
,P.A. , Maglott, D. R. , Maltais,L., Marchionni, L. , McKenzie,L., Miki 
,H., Nagashima,T. , Numata,K., Okido,T., Pavan,W.J., Pertea,G., 
Pesole,G., Petrovsky, N . , Pillai,R., Pontius , J . U . , Qi, D. , 
Ramachandran,S. , Ravasi,T., Reed, J. C, Reed, D. J., Reid,J., Ring 
, B.Z., Ringwald,M., Sandelin,A. , Schneider, C . , Semple,C.A., Setou 
,M., Shimada,K., Sultana, R. , Takenaka,Y., Taylor,M.S., Teasdale 
,R.D., Tomita,M., Verardo,R., Wagner, L., Wahlestedt , C . , Wang,Y., 
Watanabe,Y., Wells, C, Wilming, L . G . , Wynshaw-Boris , A. , Yanagisawa 
,M., Yang, I., Yang,L., Yuan,Z., Zavolan,M., Zhu,Y., Ziramer,A., 
Carninci,P., Hayatsu,N., Hirozane-Kishikawa, T . , Konno,H., Nakamura 
,M., Sakazume,N., Sato,K., Shiraki,T., Waki,K., Kawai,J., Aizawa,K. 
, Arakawa,T., Fukuda,S., Hara,A. , Hashizume, W . , Imotani,K., Ishii 
,Y., Itoh,M., Kagawa,I., Miyazaki,A., Sakai,K., Sasaki, D. , Shibata 
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FEATURES Location/Qualifiers 
source ; 1. .424 

/organism="Mus mus cuius" 

/mol_type= "rnRNA" 

/strain="C57BL/6J" 

/db_xref="taxon: 10090" 

/clone="1700026A08" 

/ sex="male" 

/ tissue_type="testis M 

/dev_stage="adult" 

/lab_host="SOLR" 

/clone_lib=" RIKEN full-length enriched, adult male testis" 
/note="Site_l : Xhol; Site_2: BamHI; cDNA library was 
prepared and sequenced in Mouse Genome Encyclopedia 
Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in 
RIKEN. Division of Experimental Animal Research in Riken 
contributed to prepare mouse tissues. 1st strand cDNA was 
primed with a primer [5 1 

GAGAGAGAGAAGGATCCAAGAGCTCTTTTTTTTTTTTTTTTVN 3 1 ] , cDNA was 
prepared by using trehalose thermo-activated reverse 
transcriptase and subsequently enriched for full-length by 
cap-trapper. Second strand cDNA was prepared with the 
primer adapter of sequence [5 1 

GAGAGAGAGAGC GGC CGCAAT T AAT T CT C GAGT T AAT TAAAT T AATC C CCCCCCCCC 
3 1 ]. cDNA was cloned into the Xhol and BamHI sites. " 



BASE COUNT 133 a 98 c 114 g 79 t 

ORIGIN 



Query Match 8.3%; 
Best Local Similarity 82.0%; 
Matches 360; Conservative 



Score 286.2; DB 14; Length 424; 
Pred. No. 1.3e-21; 
0; Mismatches 63; Indels 16; 



Gaps 



2; 



Qy 



Db 



1467 AGT CT GAGGC C AGCT CT C C C CAG GAGAC GGT CAT CT GT GGT CC C GT GAC AC GCCAGAC C A 1526 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II II I I I I I I I II 
2 AGTCAGAGGCCAGCTCTCCGCAGGAGACGGTGATCTGTGGGCCTGTAACGCGCCAGAGCA 61 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



1527 AC AT C C AGACT CT GGAC C GT CC C AT CAAGAAGGGC C CT GT C CAGCT GAT C CAAC AGT CAG 1586 
I I I I I I I I I I I Ml I I II I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I 
62 AC AT CC AGACT CT GGAT C GGCC C AT CAAGAAAGGT CCGGT GC AGCTGAT C CAAC AGT CAG 121 

1587 AGATGCGGCGGAAAAGCGACTTACTCCGGATTCTGACTTCAGGCTCCAGGGAATCGAACA 164 6 

I I I I I I I I I I II I II III I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 

122 AGAT GAGGC GGAAAAGT GAC CT GCT C C GGACT CT GACGTC AGGCT CC AGGGAGT CGAAC A 181 

1647 T GAGCAGCAAAAAAAAAG CT GT T AAAGAAAAGCT CT CAATT GAGGAGGAGCT GGAGAAAT 1706 
I I I I I I I I II II I II I I I I II I I I I I I I II I I II I I I I I I I I I I I I I I I I I I 
182 TAAGCAGCAAAAAGAAAGCTGCGAAGGAAAAGCT CTCCATCGAGGAAGAGCT GGAGAAAT 241 

17 07 GTAT CCAGGATTT CCTAAAAAAAAAAATT CCAGATCGGTTT CCT GAGAGAAAACAT CCTT 1766 

I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I II I I 

242 GTAT CCAGGATTT CT T GAAGATAAAAAT T CC AGAT C GCTT C CCT GAGC GAAAAC AT C CT T 301 

1767 GGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGGCGGGGAAAAAA 1826 

I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I III II I II 
302 GGCAGTCTGAACTTTTACGGAAGTATCATCTATAGGGGGAGGGCTGTGG 350 

1827 AAAAAAAAGAGT CAT T TT GAAATTAAC CT C ATAAAAGGAATT CATATTT TAAAGGAAAAA 18 86 
I I II I I I I I I I I I I I I I I I I II II I I I I I I I II I I I I I I I I I 
351 GT AGT C GCCACTT T GAAATAAAC CT CC C CAAAGGAAGACAT AT GT TAAAGGAAAAA 4 06 

1887 AAT ACAACTAAT GAT GC AC 1905 
I I I I I I I I I I I I II 
407 TA-ACAACTAACGGTCCAC 424 



RESULT 15 

AK006368 

LOCUS 

DEFINITION 



ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



AK006368 422 bp mRNA linear HTC 05-DEC-2002 

Mus musculus adult male testis cDNA, RIKEN full-length enriched 
library, clone : 1700026A08 product : inferred: RIKEN cDNA 4930434H12 
gene / putative [Mus musculus], full insert sequence. 
AK006368 

AK006368.1 GI: 12839431 
HTC; CAP trapper. 
Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
1 

Carninci,P. and Hayashizaki, Y. 
High-efficiency full-length cDNA cloning 
Meth. Enzymol. 303, 19-44 (1999) 



MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 
MEDLINE 
PUBMED 
REFERENCE 
AUTHORS 

TITLE 

JOURNAL 
REFERENCE 
AUTHORS 



99279253 
10349636 
2 

Carninci, P., Shibata,Y., Hayatsu,N., Sugahara,Y., Shibata, K., 

Itoh,M., Konno,H., Okazaki,Y., Muramatsu, M. and Hayashizaki, Y. 

Normalization and subtraction of cap-trapper-selected cDNAs to 

prepare full-length cDNA libraries for rapid discovery of new genes 

Genome Res. 10 (10), 1617-1630 (2000) 

20499374 

11042159 
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Shibata, K. , 
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RIKEN integrated sequence analysis (RISA) system — 384-format 
sequencing pipeline with 384 multicapillary sequencer 
Genome Res. 10 (11), 1757-1771 (2000) 
20530913 
11076861 
4 
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Kadota,K., Matsuda,H., Ashburner,M. , Batalov, S . , Casavant,T., 
Fleischmann, W. , Gaasterland, T . , Gissi,C, King,B., Kochiwa,H., 
Kuehl,P., Lewis, S., Matsuo,Y., Nikaido,I., Pesole,G., 
Quackenbush, J. , Schriml, L.M. , Staubli,F., Suzuki, R. , Tomita,M., 
Wagner, L . , Washio,T., Sakai,K., Okido,T., Furuno,M. , Aono,H., 
Baldarelli, R. , Barsh,G., Blake, J., Boffelli,D., Bojunga,N., 
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Hofmann,M., Hume, D. A., Kamiya,M., Lee,N.H., Lyons, P., 
Marchionni, L. , Mashima,J., Mazzarelli, J. , Mombaerts , P . , Nordone,P., 
Ring,B., Ringwald,M., Rodriguez, I . , Sakamoto, N-, Sasaki, H., 
Sato,K., Schonbach, C. , Seya,T., Shibata, Y., Storch,K.F., Suzuki, H., 
Toyo-oka,K., Wang,K.H., Weitz,C, Whittaker , C . , Wilming,L., 
Wynshaw-Boris, A. , Yoshida,K., Hasegawa,Y., Kawaji,H., Kohtsuki,S. 
and Hayashizaki, Y. 

Functional annotation of a full-length mouse cDNA collection 

Nature 409 (6821), 685-690 (2001) 

21085660 

11217851 
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The FANTOM Consortium and the RIKEN Genome Exploration Research 
Group Phase I & II Team. 

Analysis of the mouse transcriptome based on functional annotation 
of 60,770 full-length cDNAs 
Nature 420, 563-573 (2002) 
6 (bases 1 to 422) 
Adachi,J., Aizawa,K., Akahira,S. 
Arakawa,T., Bono,H., Carninci, P, 
Furuno,M., Hanagaki,T., Hara,A., 



, Akimura,T., Arai,A., Aono,H., 
, Fukuda, S., Fukunishi , Y. , 
Hayatsu,N., Hiramoto,K., 



TITLE 
JOURNAL 



COMMENT 



FEATURES 

source 



misc feature 



BASE COUNT 
ORIGIN 



Hiraoka,T., Hori,F., Imotani,K., Ishii,Y., Itoh,M., Izawa,M. , 
Kasukawa,T., Kato,H., Kawai,J., Kojima,Y., Konno,H., Kouda,M., 
Koya,S., Kurihara,C, Matsuyama, T . , Miyazaki,A., Nishi,K., 
Nomura, K., Numazaki , R. , Ohno,M. , Okazaki,Y., Okido,T., Owa,C, 
Saito,H., Saito,R., Sakai,C, Sakai,K., Sano f H., Sasaki, D., 
Shibata,K., Shibata,Y., Shinagawa, A. , Shiraki,T., Sogabe,Y., 
Suzuki, H., Tagami,M., Tagawa,A. , Takahashi, F. , Tanaka,T., 
Tejima,Y., Toya,T., Yamamura, T. , Yasunishi,A. , Yoshida,K., 
Yoshino,M., Muramatsu, M. and Hayashizaki, Y. 
Direct Submission 

Submitted ( 10- JUL-2000) Yoshihide Hayashizaki, The Institute of 
Physical and Chemical Research (RIKEN) , Laboratory for Genome 
Exploration Research Group, RIKEN Genomic Sciences Center (GSC) , 
RIKEN Yokohama Institute; 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, 
Kanagawa 230-0045, Japan (E-mail : genome- res @gsc. riken. go. jp, 
URL :http: //genome. gsc. riken.go.jp/, Tel : 81-45-503-9222, 
Fax:81-45-503-9216) 

Please visit our web site (http://genome.gsc.riken.go.jp/) for 
further details . 

cDNA library was prepared and sequenced in Mouse Genome 
Encyclopedia Project of Genome Exploration Research Group in Riken 
Genomic Sciences Center and Genome Science Laboratory in RIKEN. 
Division of Experimental Animal Research in Riken contributed to 
prepare mouse tissues. First strand cDNA was primed with a primer 
[5' GAGAGAGAGAAGGAT C CAAGAGCT CTTTT TT TT T TTT T TTT VN 3 f ], cDNA was 
prepared by using trehalose thermo-activated reverse transcriptase 
and subsequently enriched for full-length by cap-trapper. Second 
strand cDNA was prepared with the primer adapter of sequence [5 1 
GAGAGAGAGAGC GGC CGCAAT T AATT CT C GAGTTAATT AAAT TAAT CC CC CC C C C CC 3 1 ] . cDNA 
was cleaved with Xhol and Sstl. Cloning sites, 5' end: Xhol; 3 f 
end: Sstl. Host: SOLR. 

Location/Qualifiers 

1. .422 

/organism="Mus musculus" 
/mol_type= M mRNA" 
/strain="C57BL/6J" 
/db_xref="FANTOM_DB: 170002 6A0 8" 
/db_xref="MGI : 1893081" 
/db_xref="taxon: 10090" 
/clone-"1700026A08" 
/ sex="male" 
/tissue_type=" testis" 

/clone_lib=" RIKEN full-length enriched mouse cDNA library" 
/dev_stage="adult" 
1. .422 

/note="inferred: RIKEN cDNA 4930434H12 gene / putative 
[Mus musculus] (UniGene | Mm. 46143, TIGR-MGI1 | TC1870, 
evidence: UG/TGI) " 
/db_xref="MGI : 1914659" 
132 a 97 c 114 g 79 t 



Query Match 8.2%; Score 286; DB 11; Length 422; 

Best Local Similarity 82.2%; Pred. No. 1.4e-21; 

Matches 347; Conservative 0; Mismatches 60; Indels 15; Gaps 



l; 



Qy 1467 AGT CT GAGGC CAGCT CT CC C C AGGAGACGGT CAT CT GT GGT CC CGT GACACGCCAGAC C A 1526 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II II II 1 1 1 1 1 1 1 II 

Db 2 AGTCAGAGGCCAGCTCTCCGCAGGAGACGGTGATCTGTGGGCCTGTAACGCGCCAGAGCA 61 

Qy 1527 ACAT C C AGACT CT GGAC CGT C CCAT CAAGAAGGG C CCT GT C CAGCT GAT C CAAC AGTC AG 1586 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I 
Db 62 ACAT C C AGACT CT GGAT CGGC C CAT CAAGAAAGGT CC GGT G CAGCT GATC CAAC AGT CAG 121 

Qy 1587 AGAT GC GGC GGAAAAGC GACT T ACT C C GGATT CT GACT T CAGGCT C C AGGGAAT C GAACA 164 6 

I I I I I I I I I I I I I I I III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 122 AGAT GAGGC GGAAAAGT GAC CT GCT C C GGACT CT GAC GT CAGGCT C C AGG GAGT C GAACA 181 

Qy 1647 T GAGCAGCAAAAAAAAAGCT GT TAAAGAAAAGCT CT CAATT GAGGAGGAGCTGGAGAAAT 1706 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I II I I I I I 
Db 182 TAAGCAGCAAAAAGAAAGCT GC GAAGGAAAAGCT CT C CAT C GAGGAAGAGCTGGAGAAAT 241 

Qy 1707 GTAT C C AGGAT TT C CT AAAAAAAAAAATT C CAGAT CGGTTT C CTGAGAGAAAAC AT CCT T 1766 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I II I I I I I I I I II I I I 
Db 242 GTAT C C AGGAT T T CTT GAAGATAAAAATT C CAGAT CGCT T C C CTGAGC GAAAAC AT CCTT 301 

Qy 1767 GGCAATCTGAACTTTTAAGGAAGTATCATCTATAAGGGAGGGCTGGGGGCGGGGAAAAAA 1826 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I III II I II 
Db 302 GGCAGTCTGAACTTTTACGGAAGTATCATCTATAGGGGGAGGGCTGTGG 350 

Qy 1827 AAAAAAAAGAGT CATTTTGAAATTAACCTCATAAAAGGAATT CATATTTTAAAGGAAAAA 1886 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 351 GT AGT C GC C ACT TT GAAATAAAC CT C CC CAAAGGAAGACATAT GTTAAAGGAAAAA 406 

Qy 1887 AA 1888 

I 

Db 407 TA 408 



Search completed: January 28, 2004, 23:02:16 
Job time : 6872 sees 



